Table of Contents
Fetching ...

IMACT-CXR - An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation

Tuan-Anh Le, Anh Mai Vu, David Yang, Akash Awasthi, Hien Van Nguyen

TL;DR

IMACT-CXR presents a multi-agent conversational tutor for chest X-ray interpretation that unifies spatial validation, gaze analytics, knowledge retrieval, and image-grounded reasoning within an AutoGen-based workflow. The system employs Bayesian Knowledge Tracing to adapt feedback, trigger PubMed evidence, surface similar REFLACX cases, and provide NV-Reason-CXR-3B reasoning when needed, all while sanitizing ground-truth to preserve discovery-based learning. Core contributions include an AutoGen-driven orchestration of multimodal tutoring, anatomically aware gaze coaching, skill-aware reasoning and knowledge delivery, case similarity retrieval, and safety mechanisms to prevent premature disclosure. Preliminary evaluations and ablation studies suggest cumulative gains in localization accuracy, diagnostic reasoning, and mastery speed, with bounded latency suitable for interactive deployment; however, formal multi-participant studies and on-premise optimizations are planned to validate generalizability and efficiency.

Abstract

IMACT-CXR is an interactive multi-agent conversational tutor that helps trainees interpret chest X-rays by unifying spatial annotation, gaze analysis, knowledge retrieval, and image-grounded reasoning in a single AutoGen-based workflow. The tutor simultaneously ingests learner bounding boxes, gaze samples, and free-text observations. Specialized agents evaluate localization quality, generate Socratic coaching, retrieve PubMed evidence, suggest similar cases from REFLACX, and trigger NV-Reason-CXR-3B for vision-language reasoning when mastery remains low or the learner explicitly asks. Bayesian Knowledge Tracing (BKT) maintains skill-specific mastery estimates that drive both knowledge reinforcement and case similarity retrieval. A lung-lobe segmentation module derived from a TensorFlow U-Net enables anatomically aware gaze feedback, and safety prompts prevent premature disclosure of ground-truth labels. We describe the system architecture, implementation highlights, and integration with the REFLACX dataset for real DICOM cases. IMACT-CXR demonstrates responsive tutoring flows with bounded latency, precise control over answer leakage, and extensibility toward live residency deployment. Preliminary evaluation shows improved localization and diagnostic reasoning compared to baselines.

IMACT-CXR - An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation

TL;DR

IMACT-CXR presents a multi-agent conversational tutor for chest X-ray interpretation that unifies spatial validation, gaze analytics, knowledge retrieval, and image-grounded reasoning within an AutoGen-based workflow. The system employs Bayesian Knowledge Tracing to adapt feedback, trigger PubMed evidence, surface similar REFLACX cases, and provide NV-Reason-CXR-3B reasoning when needed, all while sanitizing ground-truth to preserve discovery-based learning. Core contributions include an AutoGen-driven orchestration of multimodal tutoring, anatomically aware gaze coaching, skill-aware reasoning and knowledge delivery, case similarity retrieval, and safety mechanisms to prevent premature disclosure. Preliminary evaluations and ablation studies suggest cumulative gains in localization accuracy, diagnostic reasoning, and mastery speed, with bounded latency suitable for interactive deployment; however, formal multi-participant studies and on-premise optimizations are planned to validate generalizability and efficiency.

Abstract

IMACT-CXR is an interactive multi-agent conversational tutor that helps trainees interpret chest X-rays by unifying spatial annotation, gaze analysis, knowledge retrieval, and image-grounded reasoning in a single AutoGen-based workflow. The tutor simultaneously ingests learner bounding boxes, gaze samples, and free-text observations. Specialized agents evaluate localization quality, generate Socratic coaching, retrieve PubMed evidence, suggest similar cases from REFLACX, and trigger NV-Reason-CXR-3B for vision-language reasoning when mastery remains low or the learner explicitly asks. Bayesian Knowledge Tracing (BKT) maintains skill-specific mastery estimates that drive both knowledge reinforcement and case similarity retrieval. A lung-lobe segmentation module derived from a TensorFlow U-Net enables anatomically aware gaze feedback, and safety prompts prevent premature disclosure of ground-truth labels. We describe the system architecture, implementation highlights, and integration with the REFLACX dataset for real DICOM cases. IMACT-CXR demonstrates responsive tutoring flows with bounded latency, precise control over answer leakage, and extensibility toward live residency deployment. Preliminary evaluation shows improved localization and diagnostic reasoning compared to baselines.

Paper Structure

This paper contains 22 sections, 6 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: System architecture showing multi-agent orchestration and state transitions.