This Artificial Intelligence (AI) Research Shows The Feasibility Of Enabling Conversational Interactions With Mobile UIs Using Large Language Models LLMs
In the past few years, language models have become the talk of the town. These models process, produce, and use natural language text to direct some ground-breaking AI applications. LLMs such as GPT-3, T5, and PaLM have performed significantly better. These models have begun to imitate humans by learning to read, complete codes, summarize and generate textual data. GPT-3, the recent model developed by OpenAI, holds amazing capabilities and shows great performance. It has a transformer architecture to process text, giving rise to a model that can easily produce content and answer questions as a human would.
Researchers have constantly been studying how natural language can communicate with computing devices. Not long ago, LLMs have shown some improvements in interacting with such devices without requiring any models or huge datasets. Considering that, a few researchers have developed a paper exploring the practicality and feasibility of using a single Large Language model to initiate conversations with a mobile Graphical User Interface (GUI). Previous studies have only been able to find a few components to make conversational interaction possible with a mobile User Interface (UI). It required task-specific models, massive datasets, and much training effort. Also, not many advancements have been observed in using LLMs for GUI interaction tasks. The researchers have now found how to use LLMs to have diverse interactions with mobile UIs. They have designed some prompting techniques to adjust an LLM to a mobile UI.
The team has developed the prompting methods so that the interaction designers can easily prototype and test the novel language interactions with users. With this, the LLMs can modify how conversational interaction designs are operated and developed. This can save a lot of time, effort, and money instead of going for models and datasets. The researchers have also designed an algorithm that can convert the view hierarchy data in an Android to HTML syntax. Since the HTML syntax is already there in the training data for LLMs, this way, LLMs can adapt to mobile UIs.
The researchers have experimented with four modeling tasks to ensure the feasibility of their approach. These are – Screen Question Generation, Screen Summarization, Screen Question-Answering, and Mapping Instruction to UI Action. The outcomes showed that their approach accomplishes competitive performance by using only two data examples per task.
- Screen Question Generation – LLMs outperformed the previous approaches by influencing the UI context with input fields to generate questions.
- Screen Summarization—Compared to the benchmark model (Screen2Words, UIST ’21), the study found that the LLMs can efficiently summarize the vital functionalities of a mobile UI and produce more accurate summaries.
- Screen Question-Answering—Compared to the off-the-shelf QA model that correctly answers 36% of questions, the 2-shot LLM produced Exact Match answers for 66.7% of questions.
- Mapping Instruction to UI Action – LLMs predict the UI object that is required for performing the taught action. The model didn’t outperform the benchmark model, but it showed a great result with the help of just two shots.
The aim of making the interaction between natural language and computing devices possible has been a pursuit in human-computer interaction. These recent studies can make this possible and bring a breakthrough in Artificial Intelligence.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.
Credit: Source link
Comments are closed.