Meet ChatGLM2-6B: The Second-Generation Version of the Open-Source Bilingual (Chinese-English) Chat Model ChatGLM-6B
Since the introduction of OpenAI’s revolutionary ChatGPT, which smashed records by gaining the fastest 100 million users for a product, considerable advancements have been made in the field of natural language conversation agents. Researchers are actively exploring various techniques and strategies to enhance chatbot models’ capabilities, allowing them to create more natural and captivating interactions with their users. As a result, several open-source and lightweight alternatives to ChatGPT have been released in the market, with one such alternative being the ChatGLM model series developed by researchers at Tsinghua University, China. This series, which is based on the General Language Model (GLM) framework, differs from the Generative Pre-trained Transformer (GPT) group of LLMs, which are more commonly seen. The series includes several bilingual models trained in Chinese and English, of which the most well-known is ChatGLM-6B, which has 6.2 billion parameters. The model has been pre-trained on over 1 trillion English and Chinese tokens and has been further fine-tuned for Chinese question-answering, summarization, and conversational tasks using techniques like reinforcement learning with human feedback.
Another standout feature of ChatGLM-6B is that it can be deployed locally and requires very few resources due to its quantization techniques. The model can even be deployed locally on consumer-grade graphics cards. It has become exceptionally popular, particularly in China, with over 2 million downloads worldwide, making it one of the most influential large-scale open-source models. As a result of its widespread adoption, Tsinghua University researchers released ChatGLM2-6B, the second-generation version of the bilingual chat model. ChatGLM2-6B includes all the strengths of the first-generation model as well as several new features that have been added, such as performance improvements, support for longer contexts, and more efficient inference. Furthermore, the research team has extended the use of model weights beyond academic purposes (as done previously), making them available for commercial use.
As a starting point, the researchers have elevated the base model of ChatGLM2-6B as compared to the first-generation version. ChatGLM2-6B uses the hybrid objective function of GLM and has been pre-trained with over 1.4 trillion English and Chinese tokens. The researchers evaluated the performance of their model against other competitive models of approximately the same size in the market. It was revealed that ChatGLM2-6B achieves noticeable performance improvements on various datasets like MMLU, CEval, BBH, etc. Another impressive upgrade demonstrated by ChatGLM2-6B is the support for longer contexts, from 2K in the previous version to 32K. The FlashAttention algorithm has been instrumental in this by speeding up attention and reducing memory consumption for even longer sequences for the attention layer. Moreover, the model has been trained with a context length of 8K during the dialogue alignment to offer users more conversational depth. ChatGLM2-6B also uses the Multi-Query Attention technique, thereby successfully achieving lower GPU memory usage of the KV Cache and increased inference speed, approximately 42%, compared to the first generation.
The researchers at Tsinghua University have open-sourced ChatGLM2-6B in hopes of encouraging developers and researchers worldwide to promote the growth and innovation of LLMs and develop several useful applications based on the model. However, the researchers also highlight the fact that given the smaller scale of the model, its decisions can often be influenced by randomness, and thus, its outputs must be carefully fact-checked for accuracy. When it comes to future work, the team has thought one step ahead and has started working on the third version of the model, ChatGLM3.
Check out the Github Link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 18k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.
Credit: Source link
Comments are closed.