Table of Contents
Fetching ...

AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering

Ziqing Wang, Chengsheng Mao, Xiaole Wen, Yuan Luo, Kaize Ding

TL;DR

AMANDA introduces a training-free, agentic framework for data-efficient Med-VQA by coupling intrinsic medical knowledge augmentation (coarse-to-fine question decomposition) with extrinsic grounding via a biomedical knowledge graph. The architecture orchestrates perception, planning, and action through multiple LLM-powered agents to iteratively refine answers and ground reasoning in external knowledge, with adaptive refinement and few-shot enhancements. Across eight diverse Med-VQA benchmarks, AMANDA yields substantial zero-shot and few-shot improvements across Med-MLLM backbones and reduces medical hallucinations by grounding explanations in reliable domain knowledge. The work demonstrates the potential of training-free, multi-agent reasoning to deliver reliable medical visual reasoning in resource-constrained settings.

Abstract

Medical Multimodal Large Language Models (Med-MLLMs) have shown great promise in medical visual question answering (Med-VQA). However, when deployed in low-resource settings where abundant labeled data are unavailable, existing Med-MLLMs commonly fail due to their medical reasoning capability bottlenecks: (i) the intrinsic reasoning bottleneck that ignores the details from the medical image; (ii) the extrinsic reasoning bottleneck that fails to incorporate specialized medical knowledge. To address those limitations, we propose AMANDA, a training-free agentic framework that performs medical knowledge augmentation via LLM agents. Specifically, our intrinsic medical knowledge augmentation focuses on coarse-to-fine question decomposition for comprehensive diagnosis, while extrinsic medical knowledge augmentation grounds the reasoning process via biomedical knowledge graph retrieval. Extensive experiments across eight Med-VQA benchmarks demonstrate substantial improvements in both zero-shot and few-shot Med-VQA settings. The code is available at https://github.com/REAL-Lab-NU/AMANDA.

AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering

TL;DR

AMANDA introduces a training-free, agentic framework for data-efficient Med-VQA by coupling intrinsic medical knowledge augmentation (coarse-to-fine question decomposition) with extrinsic grounding via a biomedical knowledge graph. The architecture orchestrates perception, planning, and action through multiple LLM-powered agents to iteratively refine answers and ground reasoning in external knowledge, with adaptive refinement and few-shot enhancements. Across eight diverse Med-VQA benchmarks, AMANDA yields substantial zero-shot and few-shot improvements across Med-MLLM backbones and reduces medical hallucinations by grounding explanations in reliable domain knowledge. The work demonstrates the potential of training-free, multi-agent reasoning to deliver reliable medical visual reasoning in resource-constrained settings.

Abstract

Medical Multimodal Large Language Models (Med-MLLMs) have shown great promise in medical visual question answering (Med-VQA). However, when deployed in low-resource settings where abundant labeled data are unavailable, existing Med-MLLMs commonly fail due to their medical reasoning capability bottlenecks: (i) the intrinsic reasoning bottleneck that ignores the details from the medical image; (ii) the extrinsic reasoning bottleneck that fails to incorporate specialized medical knowledge. To address those limitations, we propose AMANDA, a training-free agentic framework that performs medical knowledge augmentation via LLM agents. Specifically, our intrinsic medical knowledge augmentation focuses on coarse-to-fine question decomposition for comprehensive diagnosis, while extrinsic medical knowledge augmentation grounds the reasoning process via biomedical knowledge graph retrieval. Extensive experiments across eight Med-VQA benchmarks demonstrate substantial improvements in both zero-shot and few-shot Med-VQA settings. The code is available at https://github.com/REAL-Lab-NU/AMANDA.

Paper Structure

This paper contains 31 sections, 3 equations, 3 figures, 15 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of our Amanda framework. The framework comprises five specialized agents (Perceiver, Reasoner, Evaluator, Explorer, and Retriever) working collaboratively to enable comprehensive and reliable medical reasoning. Specifically, the Explorer incorporates intrinsic medical knowledge through coarse-to-fine question decomposition to enhance reasoning depth, and the Retriever integrates extrinsic medical knowledge from biomedical knowledge graphs to enable reliable medical reasoning. The Evaluator adaptively controls the depth of Med-KA to enable efficient and accurate medical diagnosis.
  • Figure 2: (a) Adaptive Reasoning Refinement: The Evaluator agent dynamic controls the medical knowledge augmentation process by analyzing the consistency between the current answer and accumulated reasoning history. (b) In-Context Examples Selection: The system ranks candidate examples using a dual-similarity metric combining visual and textual features, selecting top-K examples as in-context examples.
  • Figure 3: Analysis of framework components.