Meet FACTOOL: A Task and Domain Agnostic Framework for Detecting Factual Errors of Texts Generated by Large Language Models (e.g., ChatGPT)

On Jul 29, 2023

GPT-4 is an example of generative artificial intelligence (AI) technology, which combines several tasks in natural language processing into a single sequence-generating issue. With exceptional efficiency and interactivity, this unified architecture enables users to execute various activities (including code generation, math problem solving, and the creation of scientific publications) using a natural language interface. However, such a generative paradigm also brings with it certain particular difficulties. Due to large language models (LLMs) limitations, automatically generated text frequently displays errors or departures from the truth.

LLMs are prone to creating convincing information but may need to be more accurate or precise in their facts. This constraint limits the use of generative AI in several industries with high risks, like healthcare, finance, and law. The usefulness and dependability of the created material must thus be improved by methodically identifying these mistakes. For example, retrieval-augmented verification models for quality assurance, hallucination detection models for text summarization, and execution-based evaluation models for code are just a few examples of the single specific tasks that are the focus of the current literature on detecting and mitigating factual errors produced by machine learning models.

Given the extraordinary flexibility of the activities and domains handled by LLMs, these approaches have shown success in their respective fields. Still, it is also crucial to have a more thorough factuality detection and verification framework that is as adaptable. Additionally, the problem of factuality detection is frequently condensed in the current literature as either (i) assessing if a claim is factually accurate given a claim or (ii) detecting whether a produced claim is supported by given evidence.

In writing tasks that users frequently complete when engaging with generative models (such as ChatGPT), where they often need to assess the factuality of a long-form generation without explicit claims and proof, this task definition needs to be better matched. In this study, researchers from Shanghai Jiao Tong University, Carnegie Mellon University, City University of Hong Kong, New York University, Meta AI, The Hong Kong University of Science and Technology and Shanghai Artificial Intelligence Laboratory offer FACTOOL, a task- and domain-agnostic framework that looks for factual mistakes in documents produced by LLM. The capacity to employ tools in LLMs is essential for factuality detection, as shown in Fig. 1, where they connect the concepts of “tool use” and “factuality detection” and explain their approach.

**Figure 1:** Framework for factuality detection with tool augmentation.

To obtain proof of the factuality of the created information, FACTOOL specifically uses a variety of resources, such as Google Search, Google Scholar, code interpreters, Python, or even LLMs. Additionally, their methodology uses the LLMs’ critical thinking skills to evaluate the content’s factuality in light of the available data. They create a benchmark and run experiments for four tasks:

Knowledge-based quality assurance
Code creation
Solving mathematical problems
Writing scientific literature reviews

They address the job of factuality identification and expand it to enable a more thorough audit of the most recent generative AI models. To provide a unified and adaptable framework for factuality identification across various domains and activities, they integrate “tool use” and “factuality detection.” GPT-4 has the highest factuality in practically all situations, according to their analysis of the factuality of contemporary chatbots using FACTOOL. KB-based quality assurance tests reveal that carefully honed chatbots (Vicuna-13B) have a respectable factuality. Still, they struggle with more difficult tasks like writing scientific literature reviews and answering arithmetic problems.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 27k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🔥 Gain a competitive
edge with data: Actionable market intelligence for global brands, retailers, analysts, and investors. (Sponsored)

Credit: Source link