A New AI Research Presents A Prompt-Centric Approach For Analyzing Large Language Models LLMs Capabilities

The recent rise in the use of large language models (LLMs) has completely transformed the field of natural language processing (NLP) especially prompting LLMs to generate open-ended text. The applications of open-ended text generation are far-reaching, spanning multiple domains like question answering, story generation, code generation, human-assisted creativity, and open-ended dialogue.

As these models continue to rise, there is a growing concern about the unpredictability of these systems and, thus, a need for a better understanding of their capabilities and limitations. 

Researchers at the Georgia Institute of Technology, Shanghai Jiao Tong University, Google, and Stanford University have created a prompt taxonomy to analyze open text generation. They experimented with 288 prompts and evaluated over 3000 outputs, analyzing mitigation strategies and future research directions.

🚨 Read Our Latest AI Newsletter🚨

To analyze the capabilities and limitations of Language Models on open text generation, researchers created a taxonomy of individual constraints based on how users naturally put constraints in prompts. They designed a set of simple and natural prompts as base prompts for each constraint and varied them by dimensions such as subject and prompt template to mitigate prompt variance. 

Constraints in prompts can be classified into two categories – Stylistic constraint, which bounds the output’s style, such as writing with a flowery style, and a structural constraint bounds the output’s structure, such as limiting the number of words.

The researchers created 288 prompts and generated outputs using GPT-3, OPT, BLOOM, and GLM. They generated ten outputs per prompt to evaluate. For example, a base prompt for the stylistic constraint “mood” is “Write a passage about love that makes the reader feel [angry, fearful, happy, sad].”

Source: https://github.com/SALT-NLP/Bound-Cap-LLM

Stylistic Constraints

The researchers found that GPT-3 struggles with certain challenging stylistic constraints such as comedy, satire, irony, and literary fiction and is sensitive to style-subject pairings. GPT-3 confuses style with subject when the prompt is too challenging, and it struggles with words that are not unique to creative writing. 

However, the model’s performance is not correlated with the prompt difficulty perceived by annotators, indicating that the factors contributing to prompt difficulty differ between humans and LLMs. This highlights the importance of empirically finding which prompts are and are not challenging for LLMs.

Structural Constraints

While GPT-3 generally understands structural constraints in writing, it struggles with numerical constraints such as required word or sentence counts, often producing close but not exact outputs. The model also shows high variance in generating text of variable length when prompted with descriptive, structural constraints like “long.”

Additionally, GPT-3 fails to properly format academic papers, likely due to the lack of clear labeling for such documents in its training data.

The authors used their methodology to analyze three other LLMs, OPT-176B9, BLOOM-176B10, and GLM-130B11, using the same prompts and additional numerical structural constraint prompts. They found that these models performed worse than GPT-3, with more than half of their generated outputs being degenerate. 

Comments

The paper presents a methodology for analyzing language models’ ability to generate open-ended text under structural and stylistic constraints. The results show failures that align with noted model challenges and new failure patterns across structural and stylistic constraints. 

The authors also provide mitigations that consistently improve performance across both domains. The paper acknowledges some limitations, including that the taxonomy does not cover all aspects of stylistic and structural constraints and is not representative of all open-text generations. 

The authors also note ethical considerations, such as the potential for style misuse and annotator harm, and suggest guidelines to protect annotators. Overall, the methodology and findings presented in the paper contribute to understanding language models’ capabilities and limitations.


Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 14k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.


I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.


Credit: Source link

Comments are closed.