CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Yannian Gu; Xizhuo Zhang; Linjie Mu; Yongrui Yu; Zhongzhen Huang; Shaoting Zhang; Xiaofan Zhang

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Yannian Gu, Xizhuo Zhang, Linjie Mu, Yongrui Yu, Zhongzhen Huang, Shaoting Zhang, Xiaofan Zhang

TL;DR

CT-Flow is proposed, an agentic framework designed for interoperable volumetric interpretation designed for interoperable volumetric interpretation that provides a scalable foundation for integrating autonomous, agentic intelligence into real-world clinical radiology.

Abstract

Recent advances in Large Vision-Language Models (LVLMs) have shown strong potential for multi-modal radiological reasoning, particularly in tasks like diagnostic visual question answering (VQA) and radiology report generation. However, most existing approaches for 3D CT analysis largely rely on static, single-pass inference. In practice, clinical interpretation is a dynamic, tool-mediated workflow where radiologists iteratively review slices and use measurement, radiomics, and segmentation tools to refine findings. To bridge this gap, we propose CT-Flow, an agentic framework designed for interoperable volumetric interpretation. By leveraging the Model Context Protocol (MCP), CT-Flow shifts from closed-box inference to an open, tool-aware paradigm. We curate CT-FlowBench, the first large-scale instruction-tuning benchmark tailored for 3D CT tool-use and multi-step reasoning. Built upon this, CT-Flow functions as a clinical orchestrator capable of decomposing complex natural language queries into automated tool-use sequences. Experimental evaluations on CT-FlowBench and standard 3D VQA datasets demonstrate that CT-Flow achieves state-of-the-art performance, surpassing baseline models by 41% in diagnostic accuracy and achieving a 95% success rate in autonomous tool invocation. This work provides a scalable foundation for integrating autonomous, agentic intelligence into real-world clinical radiology.

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

TL;DR

Abstract

Paper Structure (44 sections, 2 equations, 5 figures, 5 tables)

This paper contains 44 sections, 2 equations, 5 figures, 5 tables.

Introduction
Related Work
3D Volumetric Interpretation in Medical VLMs.
Agentic Reasoning and Tool Orchestration.
Medical Agents and Clinical Tool-use.
Methodology
Standardizing Clinical Interface via MCP
Iterative Probing over ReAct
Dataset Construction
Design Rationale.
Data Source and Curation
Task Scenario Definitions.
Quantitative Analysis.
Spatial Mapping.
Diagnostic Inference.
...and 29 more sections

Figures (5)

Figure 1: Comparison of 3D CT analysis paradigms. Left: Traditional End-to-End LVLMs rely on passive visual ingestion of 3D data, resulting in static textual outputs. Right: The proposed CT-Flow framework leverages the Model Context Protocol to transform the LLM into an active agent. It dynamically orchestrates specialized tools to deliver precise, multi-modal diagnosis.
Figure 2: Overview of the CT-Flow framework. (i) Data Construction: The pipeline for raw data curation, trajectory synthesis, and the establishment of the CT-Flow benchmark. (ii) Architectures: The system decouples the LLM orchestrator from the imaging environment via FASTMCP, bridging high-level servers with medical imaging infrastructures to provide a suite of atomic tools in the Tool Space. (iii) Case Study: A demonstration of a Language-Action Trajectory $\mathcal{T}$. The orchestrator performs Active Probing by iteratively generating reasoning states ($s_t$), executing tool calls ($a_t$), and interpreting high-fidelity observations ($o_t$) to reach a grounded diagnostic answer.
Figure 3: Comparative performance of various models using the CT-Flow framework vs. the slice-based baseline.
Figure 4: Performance impact of tool category ablation. Bars indicate accuracy (%) and the red line tracks format errors. Removing specific tool classes (cls. 2-4) leads to decreased diagnostic accuracy and increased errors across all tasks, validating the necessity of the full hierarchical toolset.
Figure 5: Structure of the System Prompt. The core principles, critical thinking rules, and standard operating procedures (SOPs) for the AI Medical Imaging Assistant are displayed. To facilitate a clear presentation, the specific technical definitions of available tools have been truncated.

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

TL;DR

Abstract

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Authors

TL;DR

Abstract

Table of Contents

Figures (5)