A New Study by Google and DeepMind Introduces Geometric Complexity (GC) for Neural Network Analysis and Understanding of Deep Learning Models

Understanding how regularisation affects the properties of the learned solution is a blooming research topic. This is a particularly crucial component of deep learning. Whether we include it explicitly as a penalty term in a loss function or implicitly through the choice of hyperparameters, model architecture, or initialization, regularisation can take many shapes. In practice, regularisation is routinely used to control model complexity, putting pressure on a model to identify simple solutions rather than complicated answers, even though these forms are not often intended to be analytically tractable.

There is a need for a clear definition of model “complexity” for deep neural networks to comprehend regularisation in deep learning. Thanks to complexity theory, many methods for gauging a model’s complexity have problems when used in neural networks. The recently discovered phenomenon of ‘double-descent’ is a perfect example of this: neural networks with a high model complexity can match training data closely while also having minimal test error. However, neural networks appear to be able to interpolate while simultaneously having minimal test error, which contradicts the classical expectation that interpolation of training data is evidence of overfitting.

A new study by Google and DeepMind researchers introduces Geometric Complexity (GC), a new measure of model complexity with traits well-suited to studying deep neural networks. Through theoretical and empirical methods, the researchers show that a wide variety of regularisation and training heuristics can regulate geometric complexity using a variety of mechanisms. These standard training heuristics include:

  1. Startup routines overparameterized models with many layers
  2. Common initialization schemes
  3. Higher-than-average rates of learning, smaller batches, and implicit gradient regularisation
  4. Regularizations for flatness, label noise, spectral norms, and explicit parameter norms.

Their findings demonstrate that the geometric complexity captures the double-descent behavior seen in the test loss with increasing model parameter count.

During model training, the researchers employ stochastic gradient descent without momentum to isolate the effects under investigation from the optimization strategies employed. To prevent masking effects, they investigate the influence of a single training heuristic on geometric complexity. As a result, the main body of the paper does not use data augmentation or learning rate schedules. In the SM, they replicated most of the tests using SGD with momentum and Adam and found comparable results. In a context where a learning rate schedule, data augmentation, and explicit regularisation are all used together to boost model performance, the researchers find the same behavior of the geometric complexity. 

The theoretical reasons in this study are limited to ReLU activations and DNN architectures with log-likelihood losses from the exponential family, such as multi-dimensional regression with least-squares losses or multi-class classification with cross-entropy loss. Their findings show DNN and ResNet architectures on the MNIST and CIFAR picture datasets. Now they are focusing on improving their understanding of the effects of widely used training techniques. After this, they plan to improve training methods. 

Overall, geometric complexity is a useful lens for comprehending deep learning, as it explains how neural networks can get low test errors while producing highly expressive models. The team hopes that the findings will spark further inquiries into this link, illuminating both state-of-the-art and paving the way for the discovery of even better methods for training models.

This Article is written as a research summary article by Marktechpost Staff based on the research preprint-paper 'Why neural networks find simple solutions: the many regularizers of geometric complexity'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit


Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.


Credit: Source link

Comments are closed.