Do Modern Problems Require Modern Solutions? This Study Examines the Effectiveness of MLPs (Multi-Layer Perceptions) in Modern Settings

On Jul 11, 2023

Deep Learning is everywhere. It has revolutionized lots of fields, and the trend seems to continue in the near future. Deep learning is a subfield of machine learning which enables computers to learn and make predictions from large amounts of data. Artificial neural networks are the key component here which are inspired by the structure and functioning of the human brain.

The ability of deep learning to handle high-dimensional, unstructured data and its remarkable performance in complex tasks have made it a cornerstone of modern AI applications, driving advancements in fields ranging from healthcare and finance to robotics and computer vision.

Deep learning algorithms leverage the power of large-scale computational resources, such as graphics processing units (GPUs), to efficiently train and optimize models with millions or even billions of parameters. Nowadays, these models leverage large, pre-trained models that are fine-tuned for specific tasks. While these models, such as the Transformer architecture for natural language processing and convolutional and transformer-based models for computer vision, have achieved remarkable empirical success, their theoretical understanding lags behind.

[Sponsored] 🔥 Build your personal brand with Taplio 🚀 The 1st all-in-one AI-powered tool to grow on LinkedIn. Create better LinkedIn content 10x faster, schedule, analyze your stats & engage. Try it for free!

The disconnect between theory and practice is growing. This mainly concerns the mathematical simplicity of multi-layer perceptrons (MLPs) that are often used as test beds for analyzing empirical phenomena in more complex models. However, there is a scarcity of empirical data on MLPs trained on benchmark datasets or studies on pre-training/transfer learning.

This knowledge gap raises a fundamental question: to what extent can MLPs hold on with modern neural network architectures if we were to train them in modern settings? Let’s take a look.

Researchers from ETH Zürich conducted a series of experiments to evaluate MLPs’ performance in modern settings. They also explored the role of inductive bias and the impact of scaling compute resources on the performance of MLPs. By pushing the empirical performance limits of MLPs composed of multiple MLP blocks, the goal here is to bridge the gap between theoretical understanding and practical advancements.

MLPs are known for lacking inductive bias due to their invariance to pixel permutations. Recent advancements, such as the Vision Transformer (ViT) and the MLP-Mixer, challenge the importance of inductive bias in achieving high performance. This study aims to determine whether the absence of inductive bias can be compensated by scaling compute resources, shedding light on the relationship between inductive bias, compute resources, and performance in MLPs.

The study aims to investigate the behavior of MLPs and their empirical performance compared to practical models like Transformers and MLP-Mixers. It also explores the role of inductive bias and the impact of scaling compute resources. The authors hypothesize that MLPs can reflect empirical advances exhibited by practical models and that the lack of inductive bias can be compensated by scaling compute resources.

The research provides several key findings. To start with, it shows that MLPs exhibit similar behavior to their modern counterparts when subjected to scale. Moreover, it highlights the importance of regularization, data augmentation, and the role of SGD’s implicit bias, which differs from CNNs. Surprisingly, larger batch sizes are found to generalize better for MLPs, challenging the role of MLPs as proxies for investigating the implicit bias of SGD. Although the scale employed in this work is insufficient to support the hypothesis of lack of inductive bias being beneficial for performance, it demonstrates that sufficient scale can overcome the bad inductive bias of MLPs, leading to surprisingly strong downstream performance.

In summary, this research demonstrates that even “bad” architectures like MLPs can achieve strong performance when given enough compute resources, with a shift towards investing more compute in dataset size rather than model size.

Check out the Paper and Code. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 800+ AI Tools in AI Tools Club

Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled “Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning.” His research interests include deep learning, computer vision, video encoding, and multimedia networking.

🔥 StoryBird.ai just dropped some amazing features. Generate an illustrated story from a prompt. Check it out here. (Sponsored)

Credit: Source link