From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT
Jace Grandinetti, Rafe McBeth
TL;DR
The paper tackles the challenge of domain-specific accuracy for large language models in medical physics by avoiding fine-tuning and instead leveraging a retrieval augmented Chain-of-Thought framework (ARCoT). ARCoT integrates a domain-specific retrieval system, Step-Back prompting, and a Re-Ranking Transformer to surface relevant material and guide multi-step reasoning. On a RAPHEX Therapy exam benchmark, ARCoT substantially improved performance across leading LLMs, notably pushing GPT-4 from $67\%$ to about $90\%$, and achieving an average improvement of $47\%$ over base models. The work demonstrates a practical, model-agnostic path to enhance accuracy and reduce hallucinations in specialized domains with broad applicability.
Abstract
Large Language Models (LLMs) have achieved remarkable progress, yet their application in specialized fields, such as medical physics, remains challenging due to the need for domain-specific knowledge. This study introduces ARCoT (Adaptable Retrieval-based Chain of Thought), a framework designed to enhance the domain-specific accuracy of LLMs without requiring fine-tuning or extensive retraining. ARCoT integrates a retrieval mechanism to access relevant domain-specific information and employs step-back and chain-of-thought prompting techniques to guide the LLM's reasoning process, ensuring more accurate and context-aware responses. Benchmarking on a medical physics multiple-choice exam, our model outperformed standard LLMs and reported average human performance, demonstrating improvements of up to 68% and achieving a high score of 90%. This method reduces hallucinations and increases domain-specific performance. The versatility and model-agnostic nature of ARCoT make it easily adaptable to various domains, showcasing its significant potential for enhancing the accuracy and reliability of LLMs in specialized fields.
