Researchers from Future House and Oxford Created BioPlanner: An Automated AI Approach for Assessing and Training the Protocol-Planning Abilities of LLMs in Biology

On Jan 13, 2024

Large Language Models (LLMs) generally face difficulties with multi-step problems and long-term planning, which is an important step in designing scientific experiments. A recent research introduces a method, Bioplanner, that addresses the challenge of automating the generation of accurate protocols for scientific experiments. Researchers from Align to Innovate, Francis Crick Institute, Future House and University of Oxford introduced an automatic evaluation framework along with a dataset, BIOPROT1, that provides a solution to improve the planning abilities of LLM. BIOPROT1 is specifically focused on biology protocols. Researchers seek to expand the concept in other fields of science.

The generation of scientific protocols poses a significant challenge due to various reasons variability in descriptions, the sensitivity to tiny details, and the need for established metrics for evaluation. Traditional methods in biology research are time-consuming and have risks of error. The BIOPROT1 dataset is introduced, comprising biology protocols from Protocols.io, filtered and translated into pseudocode. The approach involves using a model that teaches LLMs to generate admissible actions and pseudocode for a protocol and evaluate the LLM’s ability to reconstruct the pseudocode from a high-level description for listing admissible pseudocode functions.

Bioplanner uses GPT-4 to convert natural language protocols into pseudocode. First, it provides a structured representation that facilitates evaluation. The framework defines a set of pseudo functions specific to each protocol. This generates a pseudocode and evaluates the model’s performance in reconstructing the pseudocode. The researchers explore multiple tasks, including next-step prediction, full protocol generation, and function retrieval, using shuffled input functions and feedback loops for error detection. The BIOPROT1 dataset is verified and the experiments prove that pseudocode representations enable more robust evaluation metrics. This successfully overcame challenges associated with n-gram overlaps and contextual embeddings.

Bioplanner addresses the critical problem of automating scientific experiment protocols by utilizing advanced language models. Evaluation of the method on the BIOPROT1 dataset shows the effectiveness of using pseudocode representations for a more accurate and robust evaluation of LLMs. As expected, GPT-4 exhibits superior performance compared to GPT -3.5 in various tasks, indicating advancements in long-term planning and multi-step problem-solving. The real-world validation, where an LLM-generated protocol is successfully executed in a laboratory, underscores the practical utility of the proposed method.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel

Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.

[Free AI Event] 🐝 ‘Meet SingleStore Pro Max, the Powerhouse Edition’ (Jan 24 2024, 10 am PST)

Credit: Source link