This AI Paper from China IntroduceS Rarebench: A Pioneering AI Benchmark to Evaluate the Capabilities of LLMs on 4 Critical Dimensions within Rare Diseases

The remarkable potential of Large Language Models (LLMs) such as ChatGPT to interpret and generate language in a way that is strikingly similar to that of humans has garnered a lot of interest. Therefore, LLM applications in healthcare are quickly becoming an exciting new area of study for AI and clinical medicine researchers. The potential of LLMs to aid physicians in medical diagnosis, clinical report writing, and medical education has been the subject of multiple investigations. However, the strengths and weaknesses of LLMs in the setting of uncommon diseases have not yet been adequately studied.

An estimated 80% of the more than 7,000 uncommon diseases identified so far have a hereditary component. Misdiagnosis or underdiagnosis is common for patients with rare disorders, and it might take years until a confirmative diagnosis is made. Disease identification and diagnosis are already challenging due to the high degree of phenotypic overlap between common diseases and rare diseases and even between rare diseases themselves. Usually, two main processes are involved in diagnosing rare diseases. To arrive at a preliminary diagnosis, doctors first gather clinical information from patients, such as symptoms, signs, medical history (personal and family), and epidemiological data. Specialized testing, such as laboratory tests or imaging examinations, will be performed next to aid in diagnosis and differential diagnosis. In uncommon diseases, many organs and systems are often involved. Therefore, it’s helpful to consult experts from diverse domains to get a more complete picture and an accurate diagnosis.

The Human Phenotype Ontology (HPO) has standardized disease phenotype terminology into a hierarchical structure, and the Online Mendelian Inheritance in Man (OMIM), Orphanet, and the Compendium of China’s Rare Diseases (CCRD) are just a few of the knowledge bases dedicated to rare diseases. Nevertheless, these approaches frequently produce subpar diagnosis results due to limitations in phenotypic data on numerous uncommon diseases in databases, a shortage of high-quality examples for training and testing, and the assumptions that underpin them. This turns into a classic few-shot classification problem because there is a shortage of real-world data and many uncommon diseases to categorize.

In the difficult area of uncommon diseases, researchers from Tsinghua University and Peking Union Medical College Beijing use LLMs to perform thorough evaluations. In particular, the average diagnostic performance of fifty specialist physicians on seventy-five high-quality case records from the PUMCH dataset, along with the 95% confidence interval, is included in Task 4 (Differential Diagnosis among Universal Rare Diseases).

The team provides a varied, multi-institutional, and uniquely adapted dataset for uncommon illnesses. On the same note, they present RareBench, an all-inclusive benchmarking platform for testing LLMs in challenging real-world clinical situations such as phenotypic extraction and differential diagnosis. They build an exhaustive knowledge graph for uncommon diseases by integrating rich knowledge sources. By leveraging a disease-phenotype graph and the hierarchical structure of the phenotype graph, they create a new algorithm for dynamic few-shot prompting based on phenotype Information Content (IC) values. In terms of differential diagnosis, this improvement greatly improves, if not surpasses, the performance of LLMs that do not include GPT-4. 

Lastly, the researchers compare GPT-4 to human doctors in five fields to show that it is just as good at differential diagnosis of rare disorders. According to the results of the studies, GPT-4 can currently diagnose rare diseases just as well as seasoned specialists.

The team hopes that RareBench will spur other developments and uses of LLMs to address the difficulties associated with clinical diagnosis, particularly for uncommon diseases. 


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel


Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.


🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]


Credit: Source link

Comments are closed.