A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data

Elham Ghelichkhan; Tolga Tasdizen

A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data

Elham Ghelichkhan, Tolga Tasdizen

TL;DR

The paper addresses chest X-ray abnormality localization by comparing object detection and phrase grounding, using an automatic eye-tracking–driven pipeline to create explainability baselines. It repurposes REFLACX/MIMIC-CXR data to train and evaluate both approaches, showing that text-guided phrase grounding ($ ext{mIoU}=0.36$) outperforms object detection ($ ext{mIoU}=0.20$) and yields higher explainability (CR $ eq$ 0.26 vs 0.48). An ET-based bounding-box generation process demonstrates that radiologists’ gaze regions align with abnormalities and can be learned by models, with PG achieving superior coverage of relevant regions. The work provides a scalable framework for integrating eye-tracking data into local VLMs and suggests future enhancements such as multiple boxes per statement and deeper integration of ET signals to boost both accuracy and interpretability in clinical localization tasks.

Abstract

Chest diseases rank among the most prevalent and dangerous global health issues. Object detection and phrase grounding deep learning models interpret complex radiology data to assist healthcare professionals in diagnosis. Object detection locates abnormalities for classes, while phrase grounding locates abnormalities for textual descriptions. This paper investigates how text enhances abnormality localization in chest X-rays by comparing the performance and explainability of these two tasks. To establish an explainability baseline, we proposed an automatic pipeline to generate image regions for report sentences using radiologists' eye-tracking data. The better performance - mIoU = 0.36 vs. 0.20 - and explainability - Containment ratio 0.48 vs. 0.26 - of the phrase grounding model infers the effectiveness of text in enhancing chest X-ray abnormality localization.

A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data

TL;DR

Abstract

A Comparison of Object Detection and Phrase Grounding Models in Chest X-ray Abnormality Localization using Eye-tracking Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)