Can Small Language Models Give High Performance? Meet StableLM: An Open Source Language Model That Can Generate Text And Code Providing High Performance With Proper Training
Stability AI is a startup in the field of artificial intelligence best known for its Stable Diffusion image-generating AI technology. Today it has introduced a new free and open-source language model called StableLM. The model is offered in three different parameter sizes for the Alpha phase: three billion, seven billion, fifteen billion, and sixty-five billion. Under the CC BY-SA-4.0 license rules, developers can review, utilize, and modify StableLM basic models for personal and commercial projects.
The groundbreaking Stable Diffusion image model, which offers a more open, scalable, and transparent alternative to proprietary AI, was released to the public in 2022 thanks to the efforts of Stability AI. Stability AI has released the StableLM set of models, furthering its mission to democratize basic AI capabilities. The StableLM models will fuel various applications with text and code generation capabilities. They show how small, efficient models may be trained to perform well.
The team’s prior open-source work with EleutherAI, a non-profit research hub, allowed them to lay the groundwork for the release of StableLM. The Pile open-source dataset was used to train several popular language models, such as GPT-J, GPT-NeoX, and the Pythia suite. Cerebras-GPT and Dolly-2 are only two examples of the many new open-source language models that expand upon these earlier ones.
The experimental dataset used to teach StableLM is based on The Pile, except its three times bigger at 1.5 trillion tokens. Despite only having 3–7 billion parameters (GPT-3 has 175 billion), StableLM achieves unexpectedly excellent performance on conversational and coding tasks thanks to the richness of this dataset. Information on the dataset will be made public at a later date.
They have released a collection of research models optimized for use in classroom settings. These refined models will first use data from five recently released open-source conversational agent datasets: Alpaca, GPT4All, Dolly, ShareGPT, and HH. Following Stanford’s Alpaca license, these fine-tuned models are available under a noncommercial CC BY-NC-SA 4.0 license for academic research.
StableLM depicts the team’s vision to develop open, approachable, and helpful AI technology through the following capabilities:
- Transparency: To confirm performance, establish interpretability approaches, pinpoint hazards, and aid in creating safeguards, researchers can “look under the hood.” Without disclosing private information or giving up authority over AI capabilities, businesses and government agencies can modify (or “tweak”) these open-source models to suit their needs.
- Accessibility: The team builds for the edge for regular people to utilize their models on their devices. Instead of depending on exclusive services from a few businesses, developers may use these models to create applications that work with a broader range of publicly available hardware. The economic benefits of AI are spread among a large group of users and creators in this way. The proposed models are open and granular, allowing researchers and academics to go beyond the limitations of closed models in terms of interpretability and safety.
- Supportive: These models are made to aid the customers, not to replace them. Instead of seeking superhuman intellect, the team focuses on improving AI’s ability to execute specific tasks in real-world contexts. They build resources that enable common people and businesses to harness AI’s potential for fostering innovation, increasing output, and expanding economic horizons.
The team highlights that the quality of the responses a user receives may vary, and they may contain unpleasant language or opinions, as is the case with any pretrained Large Language Model that lacks fine-tuning and reinforcement learning. Scale, increased data, community feedback, and optimization are all factors that should lead to considerable improvement.
Check out the GitHub and Stability AI Blog. Don’t forget to join our 19k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
🚀 Check Out 100’s AI Tools in AI Tools Club
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.
Credit: Source link
Comments are closed.