Table of Contents
Fetching ...

CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents

Zhen Xiang, Aliyah R. Hsu, Austin V. Zane, Aaron E. Kornblith, Margaret J. Lin-Martore, Jasmanpreet C. Kaur, Vasuda M. Dokiparthi, Bo Li, Bin Yu

TL;DR

<3-5 sentence high-level summary> CDR-Agent tackles the challenge of applying multiple clinical decision rules in emergency departments under time pressure by using an LLM-based agent to autonomously select and execute CDRs from unstructured notes. It combines semantic similarity-based CDR selection, structured variable extraction, and deterministic Python-script execution, with Gaussian anomaly detection and negative imputation to improve reliability. The authors built two ED datasets—the synthetic PECARN-derived set and CDR-Bench—and show substantial gains in CDR-selection accuracy and major reductions in computation time compared with a baseline LLM-prompting approach, while generating cautious imaging decisions. This work provides benchmark resources and a path toward real-time, transparent AI-assisted trauma decision-making in EDs.

Abstract

Clinical decision-making is inherently complex and fast-paced, particularly in emergency departments (EDs) where critical, rapid and high-stakes decisions are made. Clinical Decision Rules (CDRs) are standardized evidence-based tools that combine signs, symptoms, and clinical variables into decision trees to make consistent and accurate diagnoses. CDR usage is often hindered by the clinician's cognitive load, limiting their ability to quickly recall and apply the appropriate rules. We introduce CDR-Agent, a novel LLM-based system designed to enhance ED decision-making by autonomously identifying and applying the most appropriate CDRs based on unstructured clinical notes. To validate CDR-Agent, we curated two novel ED datasets: synthetic and CDR-Bench, although CDR-Agent is applicable to non ED clinics. CDR-Agent achieves a 56.3\% (synthetic) and 8.7\% (CDR-Bench) accuracy gain relative to the standalone LLM baseline in CDR selection. Moreover, CDR-Agent significantly reduces computational overhead. Using these datasets, we demonstrated that CDR-Agent not only selects relevant CDRs efficiently, but makes cautious yet effective imaging decisions by minimizing unnecessary interventions while successfully identifying most positively diagnosed cases, outperforming traditional LLM prompting approaches. Code for our work can be found at: https://github.com/zhenxianglance/medagent-cdr-agent

CDR-Agent: Intelligent Selection and Execution of Clinical Decision Rules Using Large Language Model Agents

TL;DR

<3-5 sentence high-level summary> CDR-Agent tackles the challenge of applying multiple clinical decision rules in emergency departments under time pressure by using an LLM-based agent to autonomously select and execute CDRs from unstructured notes. It combines semantic similarity-based CDR selection, structured variable extraction, and deterministic Python-script execution, with Gaussian anomaly detection and negative imputation to improve reliability. The authors built two ED datasets—the synthetic PECARN-derived set and CDR-Bench—and show substantial gains in CDR-selection accuracy and major reductions in computation time compared with a baseline LLM-prompting approach, while generating cautious imaging decisions. This work provides benchmark resources and a path toward real-time, transparent AI-assisted trauma decision-making in EDs.

Abstract

Clinical decision-making is inherently complex and fast-paced, particularly in emergency departments (EDs) where critical, rapid and high-stakes decisions are made. Clinical Decision Rules (CDRs) are standardized evidence-based tools that combine signs, symptoms, and clinical variables into decision trees to make consistent and accurate diagnoses. CDR usage is often hindered by the clinician's cognitive load, limiting their ability to quickly recall and apply the appropriate rules. We introduce CDR-Agent, a novel LLM-based system designed to enhance ED decision-making by autonomously identifying and applying the most appropriate CDRs based on unstructured clinical notes. To validate CDR-Agent, we curated two novel ED datasets: synthetic and CDR-Bench, although CDR-Agent is applicable to non ED clinics. CDR-Agent achieves a 56.3\% (synthetic) and 8.7\% (CDR-Bench) accuracy gain relative to the standalone LLM baseline in CDR selection. Moreover, CDR-Agent significantly reduces computational overhead. Using these datasets, we demonstrated that CDR-Agent not only selects relevant CDRs efficiently, but makes cautious yet effective imaging decisions by minimizing unnecessary interventions while successfully identifying most positively diagnosed cases, outperforming traditional LLM prompting approaches. Code for our work can be found at: https://github.com/zhenxianglance/medagent-cdr-agent

Paper Structure

This paper contains 19 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An example CDR for C-spine imaging. Top left: the variables/indicators required by the CDR and their definitions. Bottom left: the rule deciding whether the patient requires imaging. Right: the Python script for the CDR with automated imputation of missing variables.
  • Figure 2: Illustration of the three-step workflow of CDR-Agent. For any input clinical note, CDR-Agent first selects a number of relevant CDRs that have high semantic similarity to the clinical note. Variables required by each selected CDR are then extracted from the clinical note using an LLM, with a set of exclusion rules applied to filter invalid CDRs. Finally, CDR-Agent executes the Python code for each valid CDR for decisions.
  • Figure 3: Prompt to LLM for variable extraction from a given clinical note. The prompt includes variables required by the selected CDR and their definitions, and the formatting requirements for the extracted variable values.
  • Figure 4: (Left) A detailed breakdown of CDR label composition in CDR-Bench. Approximately 36.8% of the notes have no applicable CDRs. (Right) Token length variations across different data sources in CDR-Bench. MIMIC-IV notes are significantly longer than those from MedQA and ACN, often containing more noise and distracting information. This highlights both the diversity captured in CDR-Bench and the challenges it presents for CDR selection.
  • Figure 5: (Left) An example Q-Q plot demonstrating that a Gaussian distribution is a reasonable choice for modeling similarity scores of irrelevant CDRs. (Right) Trade-off between F1-score and computation time on a held-out set of CDR-Bench for varying numbers of random sampling iterations and note retention ratios.