MedPAO: A Protocol-Driven Agent for Structuring Medical Reports
Shrish Shrinath Vaidya, Gowthamaan Palani, Sidharth Ramesh, Velmurugan Balasubramanian, Minmini Selvam, Gokulraja Srinivasaraja, Ganapathy Krishnamurthi
TL;DR
MedPAO introduces a protocol-driven agent for structuring medical reports by grounding reasoning in established clinical workflows, notably the ABCDEF chest X-ray protocol, and orchestrating a Plan-Act-Observe loop with a modular set of tools. By combining a capable LLM with a model-context protocol and six specialized tools (concept extraction, ontology mapping, ontology filtering, concept categorization, report generation, and caching), the framework delivers protocol-compliant, verifiable outputs and reduces hallucinations common in monolithic LLMs. Empirical results show a $F1$-score of $0.96$ on concept categorization and strong clinician radiologist ratings (average $4.52$–$4.59$/5), surpassing baseline LLMs and highlighting improved reliability for structured radiology reporting. The approach is modality-agnostic and scalable, with potential extensions to additional imaging types and real-time deployment through hardware optimization and image-integrated workflows.
Abstract
The deployment of Large Language Models (LLMs) for structuring clinical data is critically hindered by their tendency to hallucinate facts and their inability to follow domain-specific rules. To address this, we introduce MedPAO, a novel agentic framework that ensures accuracy and verifiable reasoning by grounding its operation in established clinical protocols such as the ABCDEF protocol for CXR analysis. MedPAO decomposes the report structuring task into a transparent process managed by a Plan-Act-Observe (PAO) loop and specialized tools. This protocol-driven method provides a verifiable alternative to opaque, monolithic models. The efficacy of our approach is demonstrated through rigorous evaluation: MedPAO achieves an F1-score of 0.96 on the critical sub-task of concept categorization. Notably, expert radiologists and clinicians rated the final structured outputs with an average score of 4.52 out of 5, indicating a level of reliability that surpasses baseline approaches relying solely on LLM-based foundation models. The code is available at: https://github.com/MiRL-IITM/medpao-agent
