Mask of truth: model sensitivity to unexpected regions of medical images

Théo Sourget; Michelle Hestbek-Møller; Amelia Jiménez-Sánchez; Jack Junchi Xu; Veronika Cheplygina

Mask of truth: model sensitivity to unexpected regions of medical images

Théo Sourget, Michelle Hestbek-Møller, Amelia Jiménez-Sánchez, Jack Junchi Xu, Veronika Cheplygina

TL;DR

The paper investigates how convolutional neural networks for medical image classification can rely on spurious cues when the region of interest is masked, using PadChest chest X-rays and Chákṣu eye-fundus data. It combines multiple ROI-masking strategies, embedding analyses, SHAP explanations, and a domain-expert radiology study to diagnose the prevalence and nature of shortcut learning. The findings show strong performance even without the ROI on some tasks and limited transfer to external data, highlighting the challenges of bias and generalization in medical imaging. The work argues for bias-focused evaluation, cautious interpretation of explainability, and the need for multimodal approaches and clinical grounding to ensure robust, clinically relevant AI systems.

Abstract

The development of larger models for medical image analysis has led to increased performance. However, it also affected our ability to explain and validate model decisions. Models can use non-relevant parts of images, also called spurious correlations or shortcuts, to obtain high performance on benchmark datasets but fail in real-world scenarios. In this work, we challenge the capacity of convolutional neural networks (CNN) to classify chest X-rays and eye fundus images while masking out clinically relevant parts of the image. We show that all models trained on the PadChest dataset, irrespective of the masking strategy, are able to obtain an Area Under the Curve (AUC) above random. Moreover, the models trained on full images obtain good performance on images without the region of interest (ROI), even superior to the one obtained on images only containing the ROI. We also reveal a possible spurious correlation in the Chaksu dataset while the performances are more aligned with the expectation of an unbiased model. We go beyond the performance analysis with the usage of the explainability method SHAP and the analysis of embeddings. We asked a radiology resident to interpret chest X-rays under different masking to complement our findings with clinical knowledge. Our code is available at https://github.com/TheoSourget/MMC_Masking and https://github.com/TheoSourget/MMC_Masking_EyeFundus

Mask of truth: model sensitivity to unexpected regions of medical images

TL;DR

Abstract

Mask of truth: model sensitivity to unexpected regions of medical images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (19)