Microsoft AI Research Proposes eXtensible Prompt (X-Prompt) for Prompting a Large Language Model (LLM) Beyond Natural Language (NL)

On Jan 28, 2023

Due to their capacity to produce text comparable to human-written material and their versatility in various natural language processing (NLP) applications, large language models (LLMs) have become extremely popular in recent years. These models can now discover correlations and patterns in natural language text that were previously impossible. As a result, several practical applications have been created, including question-answering, text summarization, and language translation. The availability of a lot of data for LLMs to train on has been one of the main contributing elements to their success. These models may now be trained thanks to the accessibility of potent hardware like graphics processing units (GPUs) quickly. The success of LLMs has also been significantly influenced by their capacity to be tailored to certain needs. By training a pre-trained model on a smaller dataset relevant to that purpose, programmers may modify it to perform a particular goal, such as sentiment analysis or text categorization. As a result, several NLP-based apps that may be quickly tailored to certain activities and use cases have been created.

According to recent research, language models (LMs) learn better from context as their model size increases. The emergent feature demonstrates promising outcomes in zero- and few-shot learning environments by allowing a large LM to be instructed at runtime via a descriptive natural language (NL) prompt to accomplish its defined goal with good out-of-distribution (OOD) robustness. However, it is only sometimes simple to develop a detailed prompt, particularly for activities with fine-grained, intangible criteria. For instance, unless the language is well-known, it isn’t easy to describe a person’s linguistic style using NL to encourage an LM to write in that language (e.g., William Shakespeare style). They suggest the eXtensible Prompt (X-Prompt), developed to overcome the obstacles of presenting more detailed prompts. In addition to introducing a lexicon of fictitious terms, X-Prompt differs from NL prompts in that it offers an extendable interface for increasing the descriptive capabilities of prompts. As shown in Table 1, it is simple and adaptable for X-Prompt to introduce an imagined word2 reflecting a particular person’s style. This word can then be coupled with different prompt contexts to tell the LM to produce the given content in the user’s language.

They do out tests using the case study of X-Prompts for style customization. They demonstrate that X-Prompt successfully combines the advantages of NL and soft prompts, offering a potentially extendable interface for advanced interaction between people and massive LMs. They also show that X-Prompt has strong descriptive capabilities and great OOD resilience. They suggest context-guided learning with prompt augmentation to help imagined terms learn towards their widespread use against overfitting in-distribution (ID) training data to ensure that an X-Prompt can be OOD resilient like NL prompts. They advise using X-Prompt, a versatile interface for prompting a significant language model outside of natural language. Beyond style customization, like in this work, X-Prompt can improve in-context learning capabilities to handle more complex instructions for language model customization. This work approaches advanced human-large language model interaction (e.g., creative language generation, patching language models with new knowledge of entities and events, detoxifying and debiasing in language generation).

Table 1: In contrast to prompts that exclusively use NL words, X-Prompt also adds an extensive lexicon of fictitious terms (such as wgsatya and wsheldon g) to reflect concepts that NL words find difficult to convey, including a particular person’s linguistic style. In the same way that NL words can be combined with different prompt contexts to create an OOD robust X-Prompt, fictional words learnt for general usability can be used to tell the LM to generate specialised content in a particular user’s language. Note that the output samples above were created by prompting the OPT-6.7b model with the learnt imaginary words: wgsatya was discovered from Satya Nadella’s tweets, and wsheldon g was discovered through Sheldon Cooper’s comments from The Big Bang Theory. Neither of the training manuals contain “C++.”

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our Reddit Page, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

Credit: Source link