Can LLMs Run Natively on Your iPhone? Meet MLC-LLM: An Open Framework that Brings Language Models (LLMs) Directly into a Broad Class of Platforms with GPU Acceleration

On May 3, 2023

Large Language Models (LLMs) are the current hot topic in the field of Artificial Intelligence. A good level of advancements has already been made in a wide range of industries like healthcare, finance, education, entertainment, etc. The well-known large language models such as GPT, DALLE, and BERT perform extraordinary tasks and ease lives. While GPT-3 can complete codes, answer questions like humans, and generate content given just a short natural language prompt, DALLE 2 can create images responding to a simple textual description. These models are contributing to some huge transformations in Artificial Intelligence and Machine Learning and helping them move through a paradigm shift.

With the development of an increasing number of models comes the need for powerful servers to accommodate their extensive computational, memory, and hardware acceleration requirements. To make these models super effective and efficient, they should be able to run independently on consumer devices, which would increase their accessibility and availability and enable users to access powerful AI tools on their personal devices without needing an internet connection or relying on cloud servers. Recently, MLC-LLM has been introduced, which is an open framework that brings LLMs directly into a broad class of platforms like CUDA, Vulkan, and Metal that, too, with GPU acceleration.

MLC LLM enables language models to be deployed natively on a wide range of hardware backends, including CPUs and GPUs and native applications. This means that any language model can be run on local devices without the need for a server or cloud-based infrastructure. MLC LLM provides a productive framework that allows developers to optimize model performance for their own use cases, such as Natural Language Processing (NLP) or Computer Vision. It can even be accelerated using local GPUs, making it possible to run complex models with high accuracy and speed on personal devices.

🚀 JOIN the fastest ML Subreddit Community

Specific instructions to run LLMs and chatbots natively on devices have been provided for iPhone, Windows, Linux, Mac, and web browsers. For iPhone users, MLC LLM provides an iOS chat app that can be installed through the TestFlight page. The app requires at least 6GB of memory to run smoothly and has been tested on iPhone 14 Pro Max and iPhone 12 Pro. The text generation speed on the iOS app can be unstable at times and may run slow in the beginning before recovering to normal speed.

For Windows, Linux, and Mac users, MLC LLM provides a command-line interface (CLI) app to chat with the bot in the terminal. Before installing the CLI app, users should install some dependencies, including Conda, to manage the app and the latest Vulkan driver for NVIDIA GPU users on Windows and Linux. After installing the dependencies, users can follow the instructions to install the CLI app and start chatting with the bot. For web browser users, MLC LLM provides a companion project called WebLLM, which deploys models natively to browsers. Everything runs inside the browser with no server support and is accelerated with WebGPU.

In conclusion, MLC LLM is an incredible universal solution for deploying LLMs natively on diverse hardware backends and native applications. It is a great option for developers who wish to build models that can run on a wide range of devices and hardware configurations.

Check out the Github Link, Project, and Blog. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 100’s AI Tools in AI Tools Club

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

Credit: Source link