Pareto-Optimized Open-Source LLMs for Healthcare via Context Retrieval
Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Dario Garcia-Gasulla
TL;DR
The paper tackles the high cost and limited accessibility of proprietary LLMs for healthcare AI by deploying optimized context retrieval with open-source models. It introduces a reproducible CR pipeline, evaluates SC-CoT prompting, and builds an open-resource ecosystem to train and deploy cost-effective healthcare AI. OpenMedQA is proposed to evaluate open-ended medical QA, revealing a gap with MCQA and showing that DeepSeek-R1 thinking data and enhanced retrieval can bridge the gap for smaller models, approaching proprietary performance at lower cost. The results demonstrate a shifted Pareto frontier on MedQA, indicating practical implications for scalable, affordable, and reliable healthcare AI across resource-constrained settings.
Abstract
This study leverages optimized context retrieval to enhance open-source Large Language Models (LLMs) for cost-effective, high performance healthcare AI. We demonstrate that this approach achieves state-of-the-art accuracy on medical question answering at a fraction of the cost of proprietary models, significantly improving the cost-accuracy Pareto frontier on the MedQA benchmark. Key contributions include: (1) OpenMedQA, a novel benchmark revealing a performance gap in open-ended medical QA compared to multiple-choice formats; (2) a practical, reproducible pipeline for context retrieval optimization; and (3) open-source resources (Prompt Engine, CoT/ToT/Thinking databases) to empower healthcare AI development. By advancing retrieval techniques and QA evaluation, we enable more affordable and reliable LLM solutions for healthcare.
