Table of Contents
Fetching ...

CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support

Yuting Zhang, Karina V. Bunting, Asgher Champsi, Xiaoxia Wang, Wenqi Lu, Alexander Thorley, Sandeep S Hothi, Zhaowen Qiu, Baturalp Buyukates, Dipak Kotecha, Jinming Duan

TL;DR

CardAIc-Agents tackles the need for adaptive, multimodal AI in cardiac care by introducing a two-part system: the CardiacRAG agent builds an updatable cardiac knowledge base and generates task-aware plans via a hybrid retrieval pipeline, while the Chief agent orchestrates tool use, complexity-driven planning, and multidisciplinary discussions. The framework supports stepwise plan refinement, MDT-driven interpretation for complex cases, and on-demand visual validation panels to assist clinicians. Experimental results across three public datasets show CardAIc-Agents achieving superior performance compared with vision-language models and medical agents in both diagnostic accuracy and $AUC$, highlighting the benefits of domain-specific knowledge integration and dynamic adaptation. The work demonstrates practical impact for scalable cardiac care and informs future directions for improving on-demand visual outputs and multitask reasoning in clinical AI.

Abstract

Cardiovascular diseases (CVDs) remain the foremost cause of mortality worldwide, a burden worsened by a severe deficit of healthcare workers. Artificial intelligence (AI) agents have shown potential to alleviate this gap through automated detection and proactive screening, yet their clinical application remains limited by: 1) rigid sequential workflows, whereas clinical care often requires adaptive reasoning that select specific tests and, based on their results, guides personalised next steps; 2) reliance solely on intrinsic model capabilities to perform role assignment without domain-specific tool support; 3) general and static knowledge bases without continuous learning capability; and 4) fixed unimodal or bimodal inputs and lack of on-demand visual outputs when clinicians require visual clarification. In response, a multimodal framework, CardAIc-Agents, was proposed to augment models with external tools and adaptively support diverse cardiac tasks. First, a CardiacRAG agent generated task-aware plans from updatable cardiac knowledge, while the Chief agent integrated tools to autonomously execute these plans and deliver decisions. Second, to enable adaptive and case-specific customization, a stepwise update strategy was developed to dynamically refine plans based on preceding execution results, once the task was assessed as complex. Third, a multidisciplinary discussion team was proposed which was automatically invoked to interpret challenging cases, thereby supporting further adaptation. In addition, visual review panels were provided to assist validation when clinicians raised concerns. Experiments across three datasets showed the efficiency of CardAIc-Agents compared to mainstream Vision-Language Models (VLMs) and state-of-the-art agentic systems.

CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support

TL;DR

CardAIc-Agents tackles the need for adaptive, multimodal AI in cardiac care by introducing a two-part system: the CardiacRAG agent builds an updatable cardiac knowledge base and generates task-aware plans via a hybrid retrieval pipeline, while the Chief agent orchestrates tool use, complexity-driven planning, and multidisciplinary discussions. The framework supports stepwise plan refinement, MDT-driven interpretation for complex cases, and on-demand visual validation panels to assist clinicians. Experimental results across three public datasets show CardAIc-Agents achieving superior performance compared with vision-language models and medical agents in both diagnostic accuracy and , highlighting the benefits of domain-specific knowledge integration and dynamic adaptation. The work demonstrates practical impact for scalable cardiac care and informs future directions for improving on-demand visual outputs and multitask reasoning in clinical AI.

Abstract

Cardiovascular diseases (CVDs) remain the foremost cause of mortality worldwide, a burden worsened by a severe deficit of healthcare workers. Artificial intelligence (AI) agents have shown potential to alleviate this gap through automated detection and proactive screening, yet their clinical application remains limited by: 1) rigid sequential workflows, whereas clinical care often requires adaptive reasoning that select specific tests and, based on their results, guides personalised next steps; 2) reliance solely on intrinsic model capabilities to perform role assignment without domain-specific tool support; 3) general and static knowledge bases without continuous learning capability; and 4) fixed unimodal or bimodal inputs and lack of on-demand visual outputs when clinicians require visual clarification. In response, a multimodal framework, CardAIc-Agents, was proposed to augment models with external tools and adaptively support diverse cardiac tasks. First, a CardiacRAG agent generated task-aware plans from updatable cardiac knowledge, while the Chief agent integrated tools to autonomously execute these plans and deliver decisions. Second, to enable adaptive and case-specific customization, a stepwise update strategy was developed to dynamically refine plans based on preceding execution results, once the task was assessed as complex. Third, a multidisciplinary discussion team was proposed which was automatically invoked to interpret challenging cases, thereby supporting further adaptation. In addition, visual review panels were provided to assist validation when clinicians raised concerns. Experiments across three datasets showed the efficiency of CardAIc-Agents compared to mainstream Vision-Language Models (VLMs) and state-of-the-art agentic systems.

Paper Structure

This paper contains 9 sections, 13 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the CardAIc-Agents framework.
  • Figure 2: Illustration of the CardiacRAG agent. $D_i$ denotes the $i$-th source, $S_i$ is the cleaned text, $s_i^j$ is the $j$-th chunk from source $i$, $v_i^j$ is its corresponding vector, $T$ represents the transformation method, $K$ denotes keyword-based filtering, $n$ is the number of chunks retrieved, $c$ is the final retrieved content, and Cite indicates optional return of original chunks for transparency and reference.
  • Figure 3: (a) Illustration of the Multidisciplinary Discussion Team. (b) Iterative Procedure. $Q$ denotes the original input, $Z$ the intermediate tool outputs, $E_{*}, P_{*}, D_{*}$ the model responses, with $T$ as the total steps and $t$ the step index.
  • Figure 4: Visual panel generated by CardAIc-Agents: a) patient profile display b) ECG waveform with labeled P and T waves c) echocardiographic view identification with A4C segmentation video frames. Additional details are in the appendices.