Bailicai: A Domain-Optimized Retrieval-Augmented Generation Framework for Medical Applications
Cui Long, Yongbin Liu, Chunping Ouyang, Ying Yu
TL;DR
The paper addresses the gap between open-source medical LLMs and proprietary models by introducing Bailicai, a domain-optimized retrieval-augmented generation framework designed to reduce hallucinations and noise in medical QA. It combines four modules—Medical Knowledge Injection, Self-Knowledge Boundary Identification, Directed Acyclic Graph Task Decomposition, and Retrieval-Augmented Generation—along with MoDS data filtering and LoRA-based fine-tuning on Meta-Llama models. Across five medical benchmarks, Bailicai achieves state-of-the-art or competitive results at compact model scales, outperforming GPT-3.5 and approaching larger domain models on several tasks, while offering privacy advantages through local deployment. The work also demonstrates robust noise handling, targeted retrieval, and efficient computation, highlighting Bailicai as a practical path toward deployable, safe, and high-performing medical AI systems.
Abstract
Large Language Models (LLMs) have exhibited remarkable proficiency in natural language understanding, prompting extensive exploration of their potential applications across diverse domains. In the medical domain, open-source LLMs have demonstrated moderate efficacy following domain-specific fine-tuning; however, they remain substantially inferior to proprietary models such as GPT-4 and GPT-3.5. These open-source models encounter limitations in the comprehensiveness of domain-specific knowledge and exhibit a propensity for 'hallucinations' during text generation. To mitigate these issues, researchers have implemented the Retrieval-Augmented Generation (RAG) approach, which augments LLMs with background information from external knowledge bases while preserving the model's internal parameters. However, document noise can adversely affect performance, and the application of RAG in the medical field remains in its nascent stages. This study presents the Bailicai framework: a novel integration of retrieval-augmented generation with large language models optimized for the medical domain. The Bailicai framework augments the performance of LLMs in medicine through the implementation of four sub-modules. Experimental results demonstrate that the Bailicai approach surpasses existing medical domain LLMs across multiple medical benchmarks and exceeds the performance of GPT-3.5. Furthermore, the Bailicai method effectively attenuates the prevalent issue of hallucinations in medical applications of LLMs and ameliorates the noise-related challenges associated with traditional RAG techniques when processing irrelevant or pseudo-relevant documents.
