Table of Contents
Fetching ...

Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging

Sarah Müller, Louisa Fay, Lisa M. Koch, Sergios Gatidis, Thomas Küstner, Philipp Berens

TL;DR

The paper addresses shortcut learning in medical imaging caused by confounds such as acquisition devices and sites by benchmarking invariant representation learning approaches. It compares subspace disentanglement with dependence measures (MI via MINE and dCor) and adversarial classifiers (GRL) on Morpho-MNIST and CheXpert to assess their ability to decouple the primary task from spurious factors. Across experiments, MI-based MINE generally provides the strongest protection against confounding and robust disentanglement, though at the cost of longer training; data rebalancing can boost inverted-task performance but may underperform in disentanglement on some datasets. The results guide the choice of shortcut-mitigation strategies in medical imaging, highlighting that MI-based objectives offer strong generalization under distribution shift, while faster, supervised methods may trade robustness for efficiency, and suggesting future work on more datasets and additional dependence measures like MMD.

Abstract

Medical imaging cohorts are often confounded by factors such as acquisition devices, hospital sites, patient backgrounds, and many more. As a result, deep learning models tend to learn spurious correlations instead of causally related features, limiting their generalizability to new and unseen data. This problem can be addressed by minimizing dependence measures between intermediate representations of task-related and non-task-related variables. These measures include mutual information, distance correlation, and the performance of adversarial classifiers. Here, we benchmark such dependence measures for the task of preventing shortcut learning. We study a simplified setting using Morpho-MNIST and a medical imaging task with CheXpert chest radiographs. Our results provide insights into how to mitigate confounding factors in medical imaging.

Benchmarking Dependence Measures to Prevent Shortcut Learning in Medical Imaging

TL;DR

The paper addresses shortcut learning in medical imaging caused by confounds such as acquisition devices and sites by benchmarking invariant representation learning approaches. It compares subspace disentanglement with dependence measures (MI via MINE and dCor) and adversarial classifiers (GRL) on Morpho-MNIST and CheXpert to assess their ability to decouple the primary task from spurious factors. Across experiments, MI-based MINE generally provides the strongest protection against confounding and robust disentanglement, though at the cost of longer training; data rebalancing can boost inverted-task performance but may underperform in disentanglement on some datasets. The results guide the choice of shortcut-mitigation strategies in medical imaging, highlighting that MI-based objectives offer strong generalization under distribution shift, while faster, supervised methods may trade robustness for efficiency, and suggesting future work on more datasets and additional dependence measures like MMD.

Abstract

Medical imaging cohorts are often confounded by factors such as acquisition devices, hospital sites, patient backgrounds, and many more. As a result, deep learning models tend to learn spurious correlations instead of causally related features, limiting their generalizability to new and unseen data. This problem can be addressed by minimizing dependence measures between intermediate representations of task-related and non-task-related variables. These measures include mutual information, distance correlation, and the performance of adversarial classifiers. Here, we benchmark such dependence measures for the task of preventing shortcut learning. We study a simplified setting using Morpho-MNIST and a medical imaging task with CheXpert chest radiographs. Our results provide insights into how to mitigate confounding factors in medical imaging.
Paper Structure (17 sections, 9 equations, 4 figures, 6 tables)

This paper contains 17 sections, 9 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the causal graph (a) and how the compared methods address the shortcut connection (b). Robust inference performance of task $y_1$ on a shifted data distribution is only possible if the latent space is independent of the confounder $y_2$ (c).
  • Figure 2: Absolute label co-occurrences matrices of training data.
  • Figure 3: Latent subspace $z_1$ trained on Morpho-MNIST, colored by writing style $y_2$.
  • Figure 4: Latent subspace $z_1$ trained on CheXpert, colored by sex $y_2$.