An investigation into the causes of race bias in AI-based cine CMR segmentation
Tiarna Lee, Esther Puyol-Anton, Bram Ruijsink, Sebastien Roujol, Theodore Barfoot, Shaheim Ogbomo-Harmitt, Miaojing Shi, Andrew P. King
TL;DR
This study investigates the root causes of race bias in AI-driven cine CMR segmentation using short-axis images from the UK Biobank. By combining classification and segmentation experiments with interpretability analyses (Grad-CAM and latent-space PCA) and a cropping-based intervention, the authors show that the majority of race-discriminative information resides in image content outside the heart, such as subcutaneous fat and artefacts. Cropping to the heart region reduces but does not fully remove segmentation bias, indicating residual differences in the heart region and potential confounders; matching by MRI year could mitigate some effects but not all. The work highlights practical mitigation strategies, including ROI-based cropping and improved dataset balance, and emphasizes the need for region-aware inference and broader representation to foster fair AI CMR tools.
Abstract
Artificial intelligence (AI) methods are being used increasingly for the automated segmentation of cine cardiac magnetic resonance (CMR) imaging. However, these methods have been shown to be subject to race bias, i.e. they exhibit different levels of performance for different races depending on the (im)balance of the data used to train the AI model. In this paper we investigate the source of this bias, seeking to understand its root cause(s) so that it can be effectively mitigated. We perform a series of classification and segmentation experiments on short-axis cine CMR images acquired from Black and White subjects from the UK Biobank and apply AI interpretability methods to understand the results. In the classification experiments, we found that race can be predicted with high accuracy from the images alone, but less accurately from ground truth segmentations, suggesting that the distributional shift between races, which is often the cause of AI bias, is mostly image-based rather than segmentation-based. The interpretability methods showed that most attention in the classification models was focused on non-heart regions, such as subcutaneous fat. Cropping the images tightly around the heart reduced classification accuracy to around chance level. Similarly, race can be predicted from the latent representations of a biased segmentation model, suggesting that race information is encoded in the model. Cropping images tightly around the heart reduced but did not eliminate segmentation bias. We also investigate the influence of possible confounders on the bias observed.
