Expertise Is What We Want
Alan Ashworth, Munir Al-Dajani, Keegan Duchicela, Kiril Kafadarov, Allison Kurian, Othman Laraki, Amina Lazrak, Divneet Mandair, Wendy McKennon, Rebecca Miksad, Jayodita Sanghvi, Travis Zack
TL;DR
The paper addresses the challenge of translating dynamic, guideline-based cancer care into automated decision support without losing nuance. It introduces the Large Language Expert (LLE) architecture, a hybrid system that combines LLMs with machine-readable, versioned knowledge bases to translate guidelines into first-order logic, extract decision factors, and generate explainable recommendations. In a retrospective UCSF study of breast and colon cancer workups, the system achieved >95% accuracy and required clinician adjustments in less than 5% of cases, with finalization times under 7.5 minutes for non-specialists. The results suggest that LLE-based tools like Color Cancer Copilot can streamline guideline-adherent workups while preserving interpretability and allowing site-specific customization, offering a scalable path to broader dissemination of specialized expertise.
Abstract
Clinical decision-making depends on expert reasoning, which is guided by standardized, evidence-based guidelines. However, translating these guidelines into automated clinical decision support systems risks inaccuracy and importantly, loss of nuance. We share an application architecture, the Large Language Expert (LLE), that combines the flexibility and power of Large Language Models (LLMs) with the interpretability, explainability, and reliability of Expert Systems. LLMs help address key challenges of Expert Systems, such as integrating and codifying knowledge, and data normalization. Conversely, an Expert System-like approach helps overcome challenges with LLMs, including hallucinations, atomic and inexpensive updates, and testability. To highlight the power of the Large Language Expert (LLE) system, we built an LLE to assist with the workup of patients newly diagnosed with cancer. Timely initiation of cancer treatment is critical for optimal patient outcomes. However, increasing complexity in diagnostic recommendations has made it difficult for primary care physicians to ensure their patients have completed the necessary workup before their first visit with an oncologist. As with many real-world clinical tasks, these workups require the analysis of unstructured health records and the application of nuanced clinical decision logic. In this study, we describe the design & evaluation of an LLE system built to rapidly identify and suggest the correct diagnostic workup. The system demonstrated a high degree of clinical-level accuracy (>95%) and effectively addressed gaps identified in real-world data from breast and colon cancer patients at a large academic center.
