Lab-AI: Using Retrieval Augmentation to Enhance Language Models for Personalized Lab Test Interpretation in Clinical Medicine

Xiaoyu Wang; Haoyong Ouyang; Balu Bhasuran; Xiao Luo; Karim Hanna; Mia Liza A. Lustria; Carl Yang; Zhe He

Lab-AI: Using Retrieval Augmentation to Enhance Language Models for Personalized Lab Test Interpretation in Clinical Medicine

Xiaoyu Wang, Haoyong Ouyang, Balu Bhasuran, Xiao Luo, Karim Hanna, Mia Liza A. Lustria, Carl Yang, Zhe He

TL;DR

This work Tackles the problem of personalized lab test interpretation by enabling context-aware normal ranges through Retrieval-Augmented Generation (RAG). Lab-AI comprises two modules—factor retrieval and normal-range retrieval—and uses a MedlinePlus-based vector database to ground LLM outputs in credible sources. Across 122 lab tests (40 with factors, 82 without), GPT-4-turbo with RAG achieves a factor-retrieval F1 of 0.948 and a normal-range retrieval accuracy of 0.995 at the question level and 0.992 at the lab level, vastly outperforming non-RAG baselines. The results suggest substantial potential for Lab-AI to improve patient understanding and support shared decision-making, with planned future work to broaden sources, enhance UI, and extend evaluation to real-world clinical use.

Abstract

Accurate interpretation of lab results is crucial in clinical medicine, yet most patient portals use universal normal ranges, ignoring conditional factors like age and gender. This study introduces Lab-AI, an interactive system that offers personalized normal ranges using retrieval-augmented generation (RAG) from credible health sources. Lab-AI has two modules: factor retrieval and normal range retrieval. We tested these on 122 lab tests: 40 with conditional factors and 82 without. For tests with factors, normal ranges depend on patient-specific information. Our results show GPT-4-turbo with RAG achieved a 0.948 F1 score for factor retrieval and 0.995 accuracy for normal range retrieval. GPT-4-turbo with RAG outperformed the best non-RAG system by 33.5% in factor retrieval and showed 132% and 100% improvements in question-level and lab-level performance, respectively, for normal range retrieval. These findings highlight Lab-AI's potential to enhance patient understanding of lab results.

Lab-AI: Using Retrieval Augmentation to Enhance Language Models for Personalized Lab Test Interpretation in Clinical Medicine

TL;DR

Abstract

Lab-AI: Using Retrieval Augmentation to Enhance Language Models for Personalized Lab Test Interpretation in Clinical Medicine

Authors

TL;DR

Abstract

Table of Contents

Figures (6)