Table of Contents
Fetching ...

Mask of truth: model sensitivity to unexpected regions of medical images

Théo Sourget, Michelle Hestbek-Møller, Amelia Jiménez-Sánchez, Jack Junchi Xu, Veronika Cheplygina

TL;DR

The paper investigates how convolutional neural networks for medical image classification can rely on spurious cues when the region of interest is masked, using PadChest chest X-rays and Chákṣu eye-fundus data. It combines multiple ROI-masking strategies, embedding analyses, SHAP explanations, and a domain-expert radiology study to diagnose the prevalence and nature of shortcut learning. The findings show strong performance even without the ROI on some tasks and limited transfer to external data, highlighting the challenges of bias and generalization in medical imaging. The work argues for bias-focused evaluation, cautious interpretation of explainability, and the need for multimodal approaches and clinical grounding to ensure robust, clinically relevant AI systems.

Abstract

The development of larger models for medical image analysis has led to increased performance. However, it also affected our ability to explain and validate model decisions. Models can use non-relevant parts of images, also called spurious correlations or shortcuts, to obtain high performance on benchmark datasets but fail in real-world scenarios. In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image. We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an Area Under the Curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI. We also reveal a possible spurious correlation in the Chaksu dataset while the performances are more aligned with the expectation of an unbiased model. We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge. Our code is available at https://github.com/TheoSourget/MMC_Masking and https://github.com/TheoSourget/MMC_Masking_EyeFundus

Mask of truth: model sensitivity to unexpected regions of medical images

TL;DR

The paper investigates how convolutional neural networks for medical image classification can rely on spurious cues when the region of interest is masked, using PadChest chest X-rays and Chákṣu eye-fundus data. It combines multiple ROI-masking strategies, embedding analyses, SHAP explanations, and a domain-expert radiology study to diagnose the prevalence and nature of shortcut learning. The findings show strong performance even without the ROI on some tasks and limited transfer to external data, highlighting the challenges of bias and generalization in medical imaging. The work argues for bias-focused evaluation, cautious interpretation of explainability, and the need for multimodal approaches and clinical grounding to ensure robust, clinically relevant AI systems.

Abstract

The development of larger models for medical image analysis has led to increased performance. However, it also affected our ability to explain and validate model decisions. Models can use non-relevant parts of images, also called spurious correlations or shortcuts, to obtain high performance on benchmark datasets but fail in real-world scenarios. In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image. We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an Area Under the Curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI. We also reveal a possible spurious correlation in the Chaksu dataset while the performances are more aligned with the expectation of an unbiased model. We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge. Our code is available at https://github.com/TheoSourget/MMC_Masking and https://github.com/TheoSourget/MMC_Masking_EyeFundus

Paper Structure

This paper contains 21 sections, 19 figures, 4 tables.

Figures (19)

  • Figure 1: Example of chest X-ray from PadChest dataset and associated mask from CheXmask
  • Figure 2: Examples of eye fundus from each camera in the Chákṣu dataset
  • Figure 3: Example of all masking strategies used in our study. Each strategy is used to train a separate model
  • Figure 4: Mean AUC across the five models from 5-fold cross-validation on the testing set with different masking for training and evaluation images. The x-axis shows the masking strategy of the images used to evaluate the model, the y-axis shows the masking strategy of the images used to train the model. The color range in both figures is different to fit the range of the specific set.
  • Figure 5: Evolution of the AUC with standard deviation of the models trained on Full images applied to (a) images with only lungs and (b) images without the lungs while dilating the mask.
  • ...and 14 more figures