Meet I2D2: A Novel AI Framework for Generating Generic Knowledge from Language Models Using Constrained Decoding and Self-Imitation Learning

On Jul 16, 2023

The rapid advancements in language models have been primarily attributed to their massive scale, enabling mind-blowing capabilities in various natural language processing tasks. However, a thought-provoking question arises: is scale the only determinant of model performance? A recent study challenges this notion and investigates whether smaller models, despite their reduced size, can compete with the largest models available today. By leveraging innovative distillation, constrained decoding, and self-imitation learning algorithms, the study introduces a groundbreaking framework called I2D2, which empowers smaller language models to outperform models that are 100 times larger.

Empowering Smaller Models with I2D2

The primary challenge smaller language models face is their relatively lower generation quality. The I2D2 framework overcomes this obstacle through two key innovations. Firstly, it employs neurologic decoding to perform constrained generation, resulting in slight improvements in generation quality. Furthermore, the framework incorporates a small critic model that filters out low-quality generations, allowing for substantial enhancements in performance. The language model is fine-tuned in the subsequent self-imitation step using its high-quality generations obtained after critic filtering. Importantly, these steps can be iteratively applied to improve the performance of smaller language models continuously.

[Sponsored] 🔥 Build your personal brand with Taplio 🚀 The 1st all-in-one AI-powered tool to grow on LinkedIn. Create better LinkedIn content 10x faster, schedule, analyze your stats & engage. Try it for free!

Application to Generating Commonsense Knowledge

In the context of generating commonsense knowledge about everyday concepts, the I2D2 framework demonstrates impressive results. Unlike other approaches that rely on GPT-3 generations for knowledge distillation, I2D2 stands independently. Despite being based on a model that is 100 times smaller than GPT-3, I2D2 generates a high-quality corpus of generic commonsense knowledge.

Outperforming Larger Models

Comparative analysis reveals that I2D2 outperforms GPT-3 in accuracy when generating generics. By examining the accuracy of generics present in GenericsKB, GPT-3, and I2D2, it becomes evident that I2D2 achieves higher accuracy levels despite its smaller model size. The framework’s critic model is pivotal in discerning true and false common sense statements, outshining GPT-3.

Enhanced Diversity and Iterative Improvement

In addition to improved accuracy, I2D2 demonstrates greater diversity in its generations compared to GenericsKB. The generated content is ten times more diverse, which continues to improve with successive iterations of self-imitation. These findings illustrate the robustness of I2D2 in generating accurate and diverse generic statements, all while utilizing a model that is 100 times smaller than its competitors.

Implications of the Study

The key findings from this study have far-reaching implications for natural language processing. It highlights that smaller and more efficient language models possess significant potential for improvement. By employing novel algorithmic techniques such as those introduced in I2D2, smaller models can rival the performance of larger models in specific tasks. Additionally, the study challenges the notion that self-improvement is exclusive to large-scale language models, as I2D2 demonstrates the capability of smaller models to self-iterate and enhance their generation quality.

Check out the Paper, Project, and Blog. Don’t forget to join our 26k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 800+ AI Tools in AI Tools Club

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

🔥 StoryBird.ai just dropped some amazing features. Generate an illustrated story from a prompt. Check it out here. (Sponsored)

Credit: Source link