Health-LLM: Personalized Retrieval-Augmented Disease Prediction System
Qinkai Yu, Mingyu Jin, Dong Shu, Chong Zhang, Lizhou Fan, Wenyue Hua, Suiyuan Zhu, Yanda Meng, Zhenting Wang, Mengnan Du, Yongfeng Zhang
TL;DR
The paper tackles personalized disease prediction from patient health reports by integrating large-language models with retrieval-augmented feature extraction and machine learning. It introduces Health-LLM, a pipeline that uses in-context symptom feature generation, Llama Index-based scoring, RAG-grounded knowledge, and CAAFE-driven feature engineering to feed an XGBoost predictor. Empirical results on the IMCS-21 dataset show Health-LLM surpassing GPT-4 and finetuned LLaMA-2 baselines, with an accuracy of 0.833 and F1 of 0.762, and ablation studies confirming the contributions of knowledge indexing and CAAFE. The work demonstrates a practical, knowledge-grounded approach to personalized health management using LLMs and retrieval-augmented reasoning, with potential for real-world clinical deployment.
Abstract
Recent advancements in artificial intelligence (AI), especially large language models (LLMs), have significantly advanced healthcare applications and demonstrated potentials in intelligent medical treatment. However, there are conspicuous challenges such as vast data volumes and inconsistent symptom characterization standards, preventing full integration of healthcare AI systems with individual patients' needs. To promote professional and personalized healthcare, we propose an innovative framework, Heath-LLM, which combines large-scale feature extraction and medical knowledge trade-off scoring. Compared to traditional health management applications, our system has three main advantages: (1) It integrates health reports and medical knowledge into a large model to ask relevant questions to large language model for disease prediction; (2) It leverages a retrieval augmented generation (RAG) mechanism to enhance feature extraction; (3) It incorporates a semi-automated feature updating framework that can merge and delete features to improve accuracy of disease prediction. We experiment on a large number of health reports to assess the effectiveness of Health-LLM system. The results indicate that the proposed system surpasses the existing ones and has the potential to significantly advance disease prediction and personalized health management.
