Researchers from Princeton Introduce ShearedLLaMA Models for Accelerating Language Model Pre-Training via Structured Pruning

On Oct 17, 2023

Large Language Models (LLMs) have become extremely popular because of their outstanding capabilities in a variety of natural language tasks. Though they are growing at a fast pace, the massive computational resources needed to train these models are a major drawback. Consequently, there’s been a surge in interest in creating more compact and effective LLMs, such as LLaMA, MPT, and Falcon. These medium-sized models are intended to support various use cases by providing effective inference and fine-tuning. However, training even the smallest billion-parameter LLMs from the start is prohibitively expensive for many organizations due to the significant computational resources required.

Researchers have earlier demonstrated how like moderate-sized Large Language Models (LLMs) like LLaMA, smaller language models can be just as powerful. These models are thought to be a more effective substitute for large LLMs, which need a lot of processing power to train. In a recent study, a team of researchers studied the usefulness of structured pruning as a successful technique for reducing the size of bigger, pre-trained models into smaller LLMs. This method makes use of two essential strategies, which are as follows.

Targeted Structured Pruning: It is a technique that methodically eliminates layers, heads, intermediate, and hidden dimensions from a bigger language model in order to trim it to a target configuration. Because this procedure is carried out from beginning to end, the model’s coherence and functioning are preserved. It optimizes the model without sacrificing vital language comprehension abilities.

Dynamic Batch Loading: This method modifies the training data composition within each batch according to the changing loss levels in various domains. It makes sure that the model concentrates more on tasks or domains where it isn’t performing as well as it could be dynamically modifying the data samples utilized in each batch. It may effectively adjust its performance in this way, increasing overall efficiency.

Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B, two smaller LLMs created from the pruning of an LLaMA2-7B model, show how effective this suggested procedure is. This trimming procedure only consumes 50 billion tokens, or 5% of OpenLLaMA’s pre-training budget, of the training set. Notwithstanding these drawbacks, Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B perform better on a variety of 11 typical downstream jobs than other well-known LLMs of comparable scales, such Pythia, INCITE, and OpenLLaMA. These exercises address a variety of topics, including instruction tuning for open-ended generation, reading comprehension, common sense understanding, and world knowledge.

Additional training with more tokens may also result in even bigger benefits based on the performance trajectory of the pruned models. While the current study’s trials are restricted to models with a maximum of 7 billion parameters, the LLM-shearing technique is engineered to possess great generalizability and can be expanded to encompass big language models of any magnitude in prospective investigations.

To sum up, LLM shearing provides a complete approach to LLM size reduction via dynamic batch loading and focused structured pruning. The construction of Sheared-LaMA models that perform better than equivalent-sized models in a variety of downstream tasks is an effective demonstration of it. This method demonstrates how more effectively and economically smaller but strong LLMs can be developed, and it can be used for a wide range of model sizes.

Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

▶️ Now Watch AI Research Updates On Our Youtube Channel [Watch Now]

Credit: Source link