Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis
Akash Awasthi, Safwan Ahmad, Bryant Le, Hien Van Nguyen
TL;DR
This work tackles errors in chest X-ray interpretation by modeling radiologists' reported intentions and anchoring them to regions of interest in fixation-based video data. It introduces a two-module system, Temporally Grounded Intention Detection (TGID) and Region Extraction (RE), that fuses a video backbone and language backbone to output temporally grounded intention sequences and extract representative ROIs, using medical-domain adaptation of Dense Video Captioning. Pretraining on ActivityNet Captions followed by finetuning on radiology datasets (EGD-CXR, REFLACX) enables learning long-range, temporally grounded associations between visual fixation patterns and textual findings, with evaluation using both natural language generation metrics and a novel time-delay error metric (MTDE). The approach demonstrates competitive performance against state-of-the-art baselines and yields interpretable visualizations of radiologist intentions, offering potential to improve training, provide actionable feedback, and reduce diagnostic errors in chest radiography.
Abstract
In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial for correcting mistakes by guiding radiologists to the accurate regions of interest, especially in the diagnosis of chest radiograph abnormalities. In response to this imperative, we propose a novel system designed to identify the primary intentions articulated by radiologists in their reports and the corresponding regions of interest in CXR images. This system seeks to elucidate the visual context underlying radiologists' textual findings, with the potential to rectify errors made by less experienced practitioners and direct them to precise regions of interest. Importantly, the proposed system can be instrumental in providing constructive feedback to inexperienced radiologists or junior residents in the hospital, bridging the gap in face-to-face communication. The system represents a valuable tool for enhancing diagnostic accuracy and fostering continuous learning within the medical community.
