Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study
Sarah Müller, Philipp Berens
TL;DR
This study tackles shortcut learning in medical imaging by benchmarking feature disentanglement methods that separate task-relevant from confounder information in latent spaces. It systematically compares data-centric rebalancing and model-centric latent-space disentanglement approaches, including distance correlation, MI estimation, and MMD, across a controlled toy dataset and two medical imaging datasets with strong confounding. Key findings show that both data-centric and model-centric strategies improve primary-task performance under distribution shifts, with the combination of rebalancing and disentanglement (especially using distance correlation) yielding robust gains and favorable computational efficiency. The work provides practical guidance on designing robust, generalizable medical imaging models and highlights how latent-space analyses reveal disentanglement quality beyond standard AUROC metrics.
Abstract
Although deep learning models in medical imaging often achieve excellent classification performance, they can rely on shortcut learning, exploiting spurious correlations or confounding factors that are not causally related to the target task. This poses risks in clinical settings, where models must generalize across institutions, populations, and acquisition conditions. Feature disentanglement is a promising approach to mitigate shortcut learning by separating task-relevant information from confounder-related features in latent representations. In this study, we systematically evaluated feature disentanglement methods for mitigating shortcuts in medical imaging, including adversarial learning and latent space splitting based on dependence minimization. We assessed classification performance and disentanglement quality using latent space analyses across one artificial and two medical datasets with natural and synthetic confounders. We also examined robustness under varying levels of confounding and compared computational efficiency across methods. We found that shortcut mitigation methods improved classification performance under strong spurious correlations during training. Latent space analyses revealed differences in representation quality not captured by classification metrics, highlighting the strengths and limitations of each method. Model reliance on shortcuts depended on the degree of confounding in the training data. The best-performing models combine data-centric rebalancing with model-centric disentanglement, achieving stronger and more robust shortcut mitigation than rebalancing alone while maintaining similar computational efficiency.
