With Just ~20 Lines of Python Code, You can Do ‘Retrieval Augmented GPT Based QA’ Using This Open Source Repository Called PrimeQA

On Mar 3, 2023

Over the past few years, researchers have developed a keen interest in Question Answering (QA) related tasks when it comes to research in Natural Language Processing. Information retrieval (IR) systems, also known as retrievers, and machine reading comprehension (MRC) systems (also known as readers) make up the majority of the QA pipeline, The pipeline’s input is often a query and a large document collection from which the retriever extracts sections pertinent to the query’s context. On the other hand, the reader component mines such contexts for a precise response, which is then provided as the pipeline’s final output. With the breakthrough of finer pre-trained language models and more advanced algorithms for retriever and reader components, the QA research field has made remarkable progress.

Although the QA field has advanced rapidly over the past few years, there is still significant room for improvement. To undertake large-scale QA experiments, there is currently no centralized repository that makes it easy for researchers to train and analyze various state-of-the-art models. In order to create a one-stop solution for QA research and with the long-term aim of democratizing QA research by providing easy replicability, a team from IBM Research AI developed a QA repository known as ‘The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development’ or PrimeQA. It is an open-source repository that provides academics and researchers with all the necessary tools to easily and quickly create a custom QA application. Using PrimeQA, a researcher can obtain pre-trained models from various online sources and use them to execute the experiments described in a paper published at the most recent NLP conference.

The creation of the PrimeQA repository took into account several design patterns, including reproducibility, customization, etc. Users can blend different approaches with their respective companion modules to easily replicate state-of-the-art published results. For instance, combining a reader with a retriever, as done in several QA pipelines. PrimeQA also provides for customization to allow researchers to extend their models in accordance with the needs of their applications and employ unique data according to the supported data formats of the repository. To further make it simpler for developers to deploy pre-trained off-the-shelf models quickly, PrimeQA also includes many reusable components. As a result, there is less need for code modification, saving both time and labor. Moreover, PrimeQA models are built on top of Transformers, making them easy to integrate with Hugging Face Datasets and the Model Hub.

🎟 Be the first to know the latest AI research breakthroughs.

PrimeQA is an end-to-end toolbox consisting of user-friendly implementations of state-of-the-art retrievers and readers at the top of major QA leaderboards. It can perform training, inference, and performance evaluation of these models. Moreover, a number of sibling repositories offer tools for tying together different retrievers and readers and building a front-end user interface (UI) for customers. PrimeQA supports core QA functionalities like information retrieval, reading comprehension, and auxiliary capabilities such as question generation, which are described in detail below:

1. Information Retrieval: PrimeQA includes extensions for both dense (such as ColBERT) and sparse (such as BM25) retrievers. The repository consists of a single Python script to switch to different retriever algorithms by passing additional arguments.

2. Reading Comprehension: The reader component predicts an answer for a given query and a retrieved paragraph that are either directly derived from the context or is generated based on it. PrimeQA allows the training and inference of extractive and generative readers via a single Python script.

3. Question Generation: Question generation is a powerful method for enhancing the generalization of QA models. Modern sequence-to-sequence generation architectures are the foundation of PrimeQA’s QG component, which accepts unstructured and structured input text through a single Python script.

To sum up, PrimeQA is an open-source library created by QA researchers and developers to make it simple to encourage the replication and reuse of past and present works. With contributions from significant academic institutions, PrimeQA already has a strong developer community and welcomes participation from both newcomers and professionals. PrimeQA’s reusability and ease of access have attracted a lot of attention, allowing the library to develop naturally into a key tool for the quick advancement of QA community technology.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 15k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.

Credit: Source link