Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

Karthik Soman; Andrew Langdon; Catalina Villouta; Chinmay Agrawal; Lashaw Salta; Braian Peetoom; Gianmarco Bellucci; Orion J Buske

Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

Karthik Soman, Andrew Langdon, Catalina Villouta, Chinmay Agrawal, Lashaw Salta, Braian Peetoom, Gianmarco Bellucci, Orion J Buske

TL;DR

Zebra-Llama is presented, a specialized context-aware language model with high precision Retrieval Augmented Generation (RAG) capability, focusing on Ehlers-Danlos Syndrome (EDS) as a case study, which demonstrates unprecedented capabilities in handling EDS-related queries.

Abstract

Rare diseases present unique challenges in healthcare, often suffering from delayed diagnosis and fragmented information landscapes. The scarcity of reliable knowledge in these conditions poses a distinct challenge for Large Language Models (LLMs) in supporting clinical management and delivering precise patient information underscoring the need for focused training on these 'zebra' cases. We present Zebra-Llama, a specialized context-aware language model with high precision Retrieval Augmented Generation (RAG) capability, focusing on Ehlers-Danlos Syndrome (EDS) as our case study. EDS, affecting 1 in 5,000 individuals, exemplifies the complexities of rare diseases with its diverse symptoms, multiple subtypes, and evolving diagnostic criteria. By implementing a novel context-aware fine-tuning methodology trained on questions derived from medical literature, patient experiences, and clinical resources, along with expertly curated responses, Zebra-Llama demonstrates unprecedented capabilities in handling EDS-related queries. On a test set of real-world questions collected from EDS patients and clinicians, medical experts evaluated the responses generated by both models, revealing Zebra-Llama's substantial improvements over base model (Llama 3.1-8B-Instruct) in thoroughness (77.5% vs. 70.1%), accuracy (83.0% vs. 78.8%), clarity (74.7% vs. 72.0%) and citation reliability (70.6% vs. 52.3%). Released as an open-source resource, Zebra-Llama not only provides more accessible and reliable EDS information but also establishes a framework for developing specialized AI solutions for other rare conditions. This work represents a crucial step towards democratizing expert-level knowledge in rare disease management, potentially transforming how healthcare providers and patients navigate the complex landscape of rare diseases.

Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

TL;DR

Abstract

Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)