Table of Contents
Fetching ...

Eyes Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical Imaging

David Wong, Bin Wang, Gorkem Durak, Marouane Tliba, Akshay Chaudhari, Aladine Chetouani, Ahmet Enis Cetin, Cagdas Topel, Nicolo Gennaro, Camila Lopes Vendrami, Tugce Agirlar Trabzonlu, Amir Ali Rahsepar, Laetitia Perronne, Matthew Antalek, Onural Ozturk, Gokcan Okur, Andrew C. Gordon, Ayis Pyrros, Frank H. Miller, Amir Borhani, Hatice Savas, Eric Hart, Drew Torigian, Jayaram K. Udupa, Elizabeth Krupinski, Ulas Bagci

TL;DR

This work tackles the gap between computational metrics and clinical realism in synthetic medical images by introducing GazeVal, a framework that fuses radiologist eye-tracking with radiologist-driven evaluations across diagnostic and Visual Turing Test tasks. Using RoentGen to generate diffusion-based synthetic chest X-rays conditioned on MIMIC-CXR reports, the study reveals that while synthetic images can mimic disease representation to some extent, radiologists can still distinguish real from synthetic with high accuracy, especially when pathologies are present. GazeVal also uncovers task-driven differences in visual attention, showing that radiologists' gaze patterns and attention overlap differ between diagnostic and VTT tasks. The findings highlight the need for human-centric evaluation of synthetic data and suggest directions for improving clinical realism, including extending to 3D modalities and attention-guided generation to enhance trustworthiness and utility in healthcare AI.

Abstract

The demand for high-quality synthetic data for model training and augmentation has never been greater in medical imaging. However, current evaluations predominantly rely on computational metrics that fail to align with human expert recognition. This leads to synthetic images that may appear realistic numerically but lack clinical authenticity, posing significant challenges in ensuring the reliability and effectiveness of AI-driven medical tools. To address this gap, we introduce GazeVal, a practical framework that synergizes expert eye-tracking data with direct radiological evaluations to assess the quality of synthetic medical images. GazeVal leverages gaze patterns of radiologists as they provide a deeper understanding of how experts perceive and interact with synthetic data in different tasks (i.e., diagnostic or Turing tests). Experiments with sixteen radiologists revealed that 96.6% of the generated images (by the most recent state-of-the-art AI algorithm) were identified as fake, demonstrating the limitations of generative AI in producing clinically accurate images.

Eyes Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical Imaging

TL;DR

This work tackles the gap between computational metrics and clinical realism in synthetic medical images by introducing GazeVal, a framework that fuses radiologist eye-tracking with radiologist-driven evaluations across diagnostic and Visual Turing Test tasks. Using RoentGen to generate diffusion-based synthetic chest X-rays conditioned on MIMIC-CXR reports, the study reveals that while synthetic images can mimic disease representation to some extent, radiologists can still distinguish real from synthetic with high accuracy, especially when pathologies are present. GazeVal also uncovers task-driven differences in visual attention, showing that radiologists' gaze patterns and attention overlap differ between diagnostic and VTT tasks. The findings highlight the need for human-centric evaluation of synthetic data and suggest directions for improving clinical realism, including extending to 3D modalities and attention-guided generation to enhance trustworthiness and utility in healthcare AI.

Abstract

The demand for high-quality synthetic data for model training and augmentation has never been greater in medical imaging. However, current evaluations predominantly rely on computational metrics that fail to align with human expert recognition. This leads to synthetic images that may appear realistic numerically but lack clinical authenticity, posing significant challenges in ensuring the reliability and effectiveness of AI-driven medical tools. To address this gap, we introduce GazeVal, a practical framework that synergizes expert eye-tracking data with direct radiological evaluations to assess the quality of synthetic medical images. GazeVal leverages gaze patterns of radiologists as they provide a deeper understanding of how experts perceive and interact with synthetic data in different tasks (i.e., diagnostic or Turing tests). Experiments with sixteen radiologists revealed that 96.6% of the generated images (by the most recent state-of-the-art AI algorithm) were identified as fake, demonstrating the limitations of generative AI in producing clinically accurate images.

Paper Structure

This paper contains 20 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Overview of the proposed GazeVal framework, which introduces two tasks with corresponding evaluation metrics to quantitatively assess the quality of synthetic Chest X-ray images with expert knowledge.
  • Figure 2: The pipeline of generating synthetic chest X-ray images.
  • Figure 3: Synthetic X-rays generated by RoentGen with reports from MIMIC-CXR. Each X-ray is labeled with important features mentioned in the reports.
  • Figure 4: The top row contains samples of the attention maps produced from the eye gazes of radiologists. The bottom row contains their related gaze masks.
  • Figure 5: Left: Eye tracking setup. The radiologist is viewing the image on the monitor with the eye tracker in between them. Middle: EyeLink 1000 Plus eye-tracker view with calibration software. Right: Eye-tracker and example attention map.
  • ...and 3 more figures