Meet MiniCPM: An End-Side LLM with only 2.4B Parameters Excluding Embeddings

On Feb 25, 2024

In the fast-evolving world of technology, language models play a crucial role in various applications, from answering questions to generating text. However, one challenge these models face is their size, which can limit their capabilities and applications. Developers and researchers always seek efficient yet powerful solutions to address this issue.

Some existing models have attempted to balance model size and performance. However, these models still face challenges, such as generating inaccurate or hallucinated responses. Additionally, longer responses from certain models may lead to more instances of these issues. Despite these limitations, scientists are trying to improve and enhance the efficiency of language models.

Meet MiniCPM: a promising solution developed by ModelBest Inc. and TsinghuaNLP. MiniCPM boasts a modest 2.4 billion parameters, excluding embeddings, making it a compact yet powerful language model. MiniCPM demonstrates close performance to larger models on general benchmarks, showcasing its proficiency in Chinese, Mathematics, and Coding. After certain optimizations, MiniCPM even outperforms several other models in specific tasks, such as machine translation.

One significant advantage of MiniCPM is its ability to be deployed and run on smartphones, providing a convenient and accessible way to utilize its capabilities. The streaming output speed of MiniCPM is noteworthy, surpassing human verbal speed. This capability extends to deploying multi-modal models on smartphones, further expanding the model’s practical applications.

Developing solutions based on MiniCPM is also cost-effective. Parameter-efficient fine-tuning can be conducted with relatively common GPUs, making it accessible to developers. The model’s parameters are released for research and limited commercial use, emphasizing transparency and collaboration in the community.

However, like any technology, MiniCPM has limitations. Due to its smaller size, the model may sometimes generate hallucinatory responses, especially in longer outputs. Additionally, due to size constraints, its knowledge recall may not be as accurate. Nevertheless, the development team is committed to ongoing iteration and improvement.

In terms of evaluation, MiniCPM’s performance is assessed using a unified prompt input. The model undergoes testing for various tasks, with scripts and prompts available for public access on GitHub. The evaluation method ensures fairness in comparisons, considering different task approaches and their respective scores.

In conclusion, MiniCPM represents a promising step forward in language models. Its compact size, cost-effectiveness, and deployment capabilities on smartphones make it a valuable tool for developers and researchers alike. While acknowledging its limitations, the continuous efforts to enhance MiniCPM demonstrate a commitment to overcoming challenges and advancing the field of language modeling.

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]

Credit: Source link