Table of Contents
Fetching ...

An investigation into the causes of race bias in AI-based cine CMR segmentation

Tiarna Lee, Esther Puyol-Anton, Bram Ruijsink, Sebastien Roujol, Theodore Barfoot, Shaheim Ogbomo-Harmitt, Miaojing Shi, Andrew P. King

TL;DR

This study investigates the root causes of race bias in AI-driven cine CMR segmentation using short-axis images from the UK Biobank. By combining classification and segmentation experiments with interpretability analyses (Grad-CAM and latent-space PCA) and a cropping-based intervention, the authors show that the majority of race-discriminative information resides in image content outside the heart, such as subcutaneous fat and artefacts. Cropping to the heart region reduces but does not fully remove segmentation bias, indicating residual differences in the heart region and potential confounders; matching by MRI year could mitigate some effects but not all. The work highlights practical mitigation strategies, including ROI-based cropping and improved dataset balance, and emphasizes the need for region-aware inference and broader representation to foster fair AI CMR tools.

Abstract

Artificial intelligence (AI) methods are being used increasingly for the automated segmentation of cine cardiac magnetic resonance (CMR) imaging. However, these methods have been shown to be subject to race bias, i.e. they exhibit different levels of performance for different races depending on the (im)balance of the data used to train the AI model. In this paper we investigate the source of this bias, seeking to understand its root cause(s) so that it can be effectively mitigated. We perform a series of classification and segmentation experiments on short-axis cine CMR images acquired from Black and White subjects from the UK Biobank and apply AI interpretability methods to understand the results. In the classification experiments, we found that race can be predicted with high accuracy from the images alone, but less accurately from ground truth segmentations, suggesting that the distributional shift between races, which is often the cause of AI bias, is mostly image-based rather than segmentation-based. The interpretability methods showed that most attention in the classification models was focused on non-heart regions, such as subcutaneous fat. Cropping the images tightly around the heart reduced classification accuracy to around chance level. Similarly, race can be predicted from the latent representations of a biased segmentation model, suggesting that race information is encoded in the model. Cropping images tightly around the heart reduced but did not eliminate segmentation bias. We also investigate the influence of possible confounders on the bias observed.

An investigation into the causes of race bias in AI-based cine CMR segmentation

TL;DR

This study investigates the root causes of race bias in AI-driven cine CMR segmentation using short-axis images from the UK Biobank. By combining classification and segmentation experiments with interpretability analyses (Grad-CAM and latent-space PCA) and a cropping-based intervention, the authors show that the majority of race-discriminative information resides in image content outside the heart, such as subcutaneous fat and artefacts. Cropping to the heart region reduces but does not fully remove segmentation bias, indicating residual differences in the heart region and potential confounders; matching by MRI year could mitigate some effects but not all. The work highlights practical mitigation strategies, including ROI-based cropping and improved dataset balance, and emphasizes the need for region-aware inference and broader representation to foster fair AI CMR tools.

Abstract

Artificial intelligence (AI) methods are being used increasingly for the automated segmentation of cine cardiac magnetic resonance (CMR) imaging. However, these methods have been shown to be subject to race bias, i.e. they exhibit different levels of performance for different races depending on the (im)balance of the data used to train the AI model. In this paper we investigate the source of this bias, seeking to understand its root cause(s) so that it can be effectively mitigated. We perform a series of classification and segmentation experiments on short-axis cine CMR images acquired from Black and White subjects from the UK Biobank and apply AI interpretability methods to understand the results. In the classification experiments, we found that race can be predicted with high accuracy from the images alone, but less accurately from ground truth segmentations, suggesting that the distributional shift between races, which is often the cause of AI bias, is mostly image-based rather than segmentation-based. The interpretability methods showed that most attention in the classification models was focused on non-heart regions, such as subcutaneous fat. Cropping the images tightly around the heart reduced classification accuracy to around chance level. Similarly, race can be predicted from the latent representations of a biased segmentation model, suggesting that race information is encoded in the model. Cropping images tightly around the heart reduced but did not eliminate segmentation bias. We also investigate the influence of possible confounders on the bias observed.
Paper Structure (13 sections, 9 figures, 5 tables)

This paper contains 13 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: An illustration of the combination of images and segmentations used as input to the protected attribute classifiers
  • Figure 2: Examples of the normalised CMR images and GradCAM images for the classification model trained on the Im-Im-Im dataset for Black vs White subjects. Higher values (red) correspond to important areas used for race classification; lower values (blue) correspond to less important areas. The top image displays a heatmap where the non-heart regions have higher activations, the bottom image shows a heatmap where artefacts have higher activations.
  • Figure 3: Examples of images including and excluding the heart. a) image cropped around the heart b) image with the heart blurred
  • Figure 4: Overall Dice similarity coefficient (DSC) for segmentation experiments using original (a) and cropped (b) CMR images. Statistical significance was tested using a Mann-Whitney U test and is denoted by **** (p $\leq$ 0.0001), *** (0.001 $<$ p $\leq$ 0.0001), ** (0.01 $<$ p $\leq$ 0.001), * (0.01 $<$ p $\leq$0.05), ns (0.05 $\leq$ p).
  • Figure S1: Component 1 and 2 of PCA on latent space representations of CMR images from nnU-Net.
  • ...and 4 more figures