Deepmind Researchers Propose ‘ReLICv2’: Pushing The Limits of Self-Supervised ResNets

The supervised learning architectures generally require a massive amount of labeled data. Acquiring this vast amount of high-quality labeled data can turn out to be a very costly and time-consuming task. The main idea behind self-supervised methods in deep learning is to learn the patterns from a given set of unlabelled data and fine-tune the model with few labeled data. Self-supervised learning using residual networks has recently progressed, but they still underperform by a large margin corresponding to supervised residual network models on ImageNet classification benchmarks. This poor performance has rendered the use of self-supervised models in performance-critical scenarios till this point.          

To address the poor performance of self-supervised models, a team of researchers from DeepMind has recently proposed a new method called RELICv2 to improve the performance of the self-supervised methods and consistently achieve a better performance than baseline supervised learning benchmarks. The proposed RELICv2 is based on the past RELIC framework, which stands for “Representation Learning via Invariant Causal Mechanisms” which uses the principle of invariant prediction for representation learning. In this manner, the RELICv2 (RELIC version 2) model can incorporate RELIC’s capability of solving downstream classification tasks.    

The RELICv2 leverages the idea of the RELIC (Mitrovic et al., 2021) framework and augments it with better strategies for selecting similar and dissimilar points. These strategies are then further incorporated into both the contrastive and invariance objectives. The researchers have performed experiments to demonstrate that theirs is the first self-supervised model capable of outperforming the supervised baseline by only changing the self-supervised training scheme and keeping the network architecture the same as that of the baseline. The significant contributions of the research work can be summarised as follows:

1) Improvements upon the RELIC model 

2) Insights into representation learning capability of RELICv2 and scalability of the proposed scheme

3) Showed that the proposed RELICv2 performs comparably to state-of-the-art self-supervised vision transformers

The pseudo-code explaining the overall pipeline of the proposed RELICv2 is shown below: 

Source: https://arxiv.org/pdf/2201.05119.pdf

On top- 1 classification accuracy on ImageNet RELICv2 achieves 77.1%, with a ResNet50, while with a ResNet200 2x it achieves 80.6%. RELICv2 outperforms the supervised ResNet50 baseline on linear ImageNet evaluation across 1x, 2x, and 4x variants as depicted in the plot below:

Source: https://arxiv.org/pdf/2201.05119.pdf

Also, the transfer performance compared relatively to supervised baseline is summarised in the figure shown below: 

Source: https://arxiv.org/pdf/2201.05119.pdf

The paper also provides a considerable number of rigorous experiments and analyses of the models, which include: out of distribution generalization, large-scale transfer, latent space analysis, scaling analysis, ablation studies, and saliency marking. The details regarding these experiments and analysis can be found in the original paper from this link.    

As unlabelled data becomes more and more abundant, it is essential to shift into a paradigm of self-supervised learning rather than relying on traditional supervised deep learning architectures. The work carried out by the research team demonstrates that self-supervised learning is capable of performing better than some of the supervised schemes on image classification tasks. Apart from image classification tasks, self-supervised learning has enormous potential in face detection, signature acknowledgment, sentence completion, stock prices and trends predictions, and text colorization. For all of the mentioned tasks, a vast amount of unlabelled data is available and increasing every day. Advanced self-supervised learning schemes, as discussed in the paper, can provide scientists and engineers the power to tap into the potential that these massive unlabelled datasets promise.           

The framework proposed in this work outperforms many vision transformers that have recently emerged as promising visual representation learning architectures. This is despite the fact that these vision transformers have better architectures and more involved training procedures. For future studies, the team of researchers proposes using recent architectural and model training innovations alongside their RELICv2 framework, which can lead to further improvements.   

Paper: https://arxiv.org/pdf/2201.05119.pdf

Credit: Source link

Comments are closed.