Agentic Systems in Radiology: Design, Applications, Evaluation, and Challenges
Christian Bluethgen, Dave Van Veen, Daniel Truhn, Jakob Nikolas Kather, Michael Moor, Malgorzata Polacin, Akshay Chaudhari, Thomas Frauenfelder, Curtis P. Langlotz, Michael Krauthammer, Farhad Nooralahzadeh
TL;DR
This paper argues that radiology stands to gain from LLM-based agentic systems that can reason, plan, and act across multi-step tasks by integrating external tools and memory. It surveys the technical foundations (agents, tools, grounding, memory, and design patterns), frames radiology as a complex environment with rich knowledge sources and interoperable infrastructure, and presents concrete applications from report drafting to MDT discussions. A multi-tier evaluation framework (planning, execution, outcome, system-level) is proposed to capture complex, open-ended performance, complemented by benchmarks and guidelines to advance safe, effective deployment. The authors highlight significant challenges—LLM limits, cascading errors, multi-agent coordination, and governance—arguing that careful design, robust evaluation, and human-AI collaboration are essential for translating agentic radiology from prototype to clinical impact.
Abstract
Building agents, systems that perceive and act upon their environment with a degree of autonomy, has long been a focus of AI research. This pursuit has recently become vastly more practical with the emergence of large language models (LLMs) capable of using natural language to integrate information, follow instructions, and perform forms of "reasoning" and planning across a wide range of tasks. With its multimodal data streams and orchestrated workflows spanning multiple systems, radiology is uniquely suited to benefit from agents that can adapt to context and automate repetitive yet complex tasks. In radiology, LLMs and their multimodal variants have already demonstrated promising performance for individual tasks such as information extraction and report summarization. However, using LLMs in isolation underutilizes their potential to support complex, multi-step workflows where decisions depend on evolving context from multiple information sources. Equipping LLMs with external tools and feedback mechanisms enables them to drive systems that exhibit a spectrum of autonomy, ranging from semi-automated workflows to more adaptive agents capable of managing complex processes. This review examines the design of such LLM-driven agentic systems, highlights key applications, discusses evaluation methods for planning and tool use, and outlines challenges such as error cascades, tool-use efficiency, and health IT integration.
