Table of Contents
Fetching ...

Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis

Akash Awasthi, Safwan Ahmad, Bryant Le, Hien Van Nguyen

TL;DR

This work tackles errors in chest X-ray interpretation by modeling radiologists' reported intentions and anchoring them to regions of interest in fixation-based video data. It introduces a two-module system, Temporally Grounded Intention Detection (TGID) and Region Extraction (RE), that fuses a video backbone and language backbone to output temporally grounded intention sequences and extract representative ROIs, using medical-domain adaptation of Dense Video Captioning. Pretraining on ActivityNet Captions followed by finetuning on radiology datasets (EGD-CXR, REFLACX) enables learning long-range, temporally grounded associations between visual fixation patterns and textual findings, with evaluation using both natural language generation metrics and a novel time-delay error metric (MTDE). The approach demonstrates competitive performance against state-of-the-art baselines and yields interpretable visualizations of radiologist intentions, offering potential to improve training, provide actionable feedback, and reduce diagnostic errors in chest radiography.

Abstract

In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial for correcting mistakes by guiding radiologists to the accurate regions of interest, especially in the diagnosis of chest radiograph abnormalities. In response to this imperative, we propose a novel system designed to identify the primary intentions articulated by radiologists in their reports and the corresponding regions of interest in CXR images. This system seeks to elucidate the visual context underlying radiologists' textual findings, with the potential to rectify errors made by less experienced practitioners and direct them to precise regions of interest. Importantly, the proposed system can be instrumental in providing constructive feedback to inexperienced radiologists or junior residents in the hospital, bridging the gap in face-to-face communication. The system represents a valuable tool for enhancing diagnostic accuracy and fostering continuous learning within the medical community.

Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis

TL;DR

This work tackles errors in chest X-ray interpretation by modeling radiologists' reported intentions and anchoring them to regions of interest in fixation-based video data. It introduces a two-module system, Temporally Grounded Intention Detection (TGID) and Region Extraction (RE), that fuses a video backbone and language backbone to output temporally grounded intention sequences and extract representative ROIs, using medical-domain adaptation of Dense Video Captioning. Pretraining on ActivityNet Captions followed by finetuning on radiology datasets (EGD-CXR, REFLACX) enables learning long-range, temporally grounded associations between visual fixation patterns and textual findings, with evaluation using both natural language generation metrics and a novel time-delay error metric (MTDE). The approach demonstrates competitive performance against state-of-the-art baselines and yields interpretable visualizations of radiologist intentions, offering potential to improve training, provide actionable feedback, and reduce diagnostic errors in chest radiography.

Abstract

In the realm of chest X-ray (CXR) image analysis, radiologists meticulously examine various regions, documenting their observations in reports. The prevalence of errors in CXR diagnoses, particularly among inexperienced radiologists and hospital residents, underscores the importance of understanding radiologists' intentions and the corresponding regions of interest. This understanding is crucial for correcting mistakes by guiding radiologists to the accurate regions of interest, especially in the diagnosis of chest radiograph abnormalities. In response to this imperative, we propose a novel system designed to identify the primary intentions articulated by radiologists in their reports and the corresponding regions of interest in CXR images. This system seeks to elucidate the visual context underlying radiologists' textual findings, with the potential to rectify errors made by less experienced practitioners and direct them to precise regions of interest. Importantly, the proposed system can be instrumental in providing constructive feedback to inexperienced radiologists or junior residents in the hospital, bridging the gap in face-to-face communication. The system represents a valuable tool for enhancing diagnostic accuracy and fostering continuous learning within the medical community.
Paper Structure (15 sections, 1 equation, 4 figures, 2 tables)

This paper contains 15 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Our proposed system to detect radiologists intention with corresponding ROI
  • Figure 2: TGID module overview: A sequence to sequence model which takes video features and summarized radiology report with appended time tokens as input and output the intention sequence with temporal grounding
  • Figure 3: Anticipated intentions of radiologists and their corresponding regions of interest are illustrated. This figure depicts the specific areas within the image that radiologists focus on for each diagnosis
  • Figure 4: Distribution plot of differnce between the predicted and true start and end time for each intention in the test set