Table of Contents
Fetching ...

Quantifying the Impact of Population Shift Across Age and Sex for Abdominal Organ Segmentation

Kate Čevora, Ben Glocker, Wenjia Bai

TL;DR

This work addresses domain generalisation in abdominal CT segmentation by quantifying how population shift due to age and sex affects segmentation performance. It introduces a performance-gap metric to measure the maximal impact of population shift and compares it to cross-dataset shift using large public datasets, applying a state-of-the-art 3D nnU-Net. The study finds population shift to be a significant, asymmetric, and dataset-dependent factor, with impacts comparable to cross-dataset shift, and shows that training-data diversity measured by organ-volume variance correlates with better generalisation. The authors argue that demographic balancing alone is insufficient for fairness and advocate developing image-feature–based diversity metrics to guide dataset curation and augmentation for robust, equitable abdominal organ segmentation.

Abstract

Deep learning-based medical image segmentation has seen tremendous progress over the last decade, but there is still relatively little transfer into clinical practice. One of the main barriers is the challenge of domain generalisation, which requires segmentation models to maintain high performance across a wide distribution of image data. This challenge is amplified by the many factors that contribute to the diverse appearance of medical images, such as acquisition conditions and patient characteristics. The impact of shifting patient characteristics such as age and sex on segmentation performance remains relatively under-studied, especially for abdominal organs, despite that this is crucial for ensuring the fairness of the segmentation model. We perform the first study to determine the impact of population shift with respect to age and sex on abdominal CT image segmentation, by leveraging two large public datasets, and introduce a novel metric to quantify the impact. We find that population shift is a challenge similar in magnitude to cross-dataset shift for abdominal organ segmentation, and that the effect is asymmetric and dataset-dependent. We conclude that dataset diversity in terms of known patient characteristics is not necessarily equivalent to dataset diversity in terms of image features. This implies that simple population matching to ensure good generalisation and fairness may be insufficient, and we recommend that fairness research should be directed towards better understanding and quantifying medical image dataset diversity in terms of performance-relevant characteristics such as organ morphology.

Quantifying the Impact of Population Shift Across Age and Sex for Abdominal Organ Segmentation

TL;DR

This work addresses domain generalisation in abdominal CT segmentation by quantifying how population shift due to age and sex affects segmentation performance. It introduces a performance-gap metric to measure the maximal impact of population shift and compares it to cross-dataset shift using large public datasets, applying a state-of-the-art 3D nnU-Net. The study finds population shift to be a significant, asymmetric, and dataset-dependent factor, with impacts comparable to cross-dataset shift, and shows that training-data diversity measured by organ-volume variance correlates with better generalisation. The authors argue that demographic balancing alone is insufficient for fairness and advocate developing image-feature–based diversity metrics to guide dataset curation and augmentation for robust, equitable abdominal organ segmentation.

Abstract

Deep learning-based medical image segmentation has seen tremendous progress over the last decade, but there is still relatively little transfer into clinical practice. One of the main barriers is the challenge of domain generalisation, which requires segmentation models to maintain high performance across a wide distribution of image data. This challenge is amplified by the many factors that contribute to the diverse appearance of medical images, such as acquisition conditions and patient characteristics. The impact of shifting patient characteristics such as age and sex on segmentation performance remains relatively under-studied, especially for abdominal organs, despite that this is crucial for ensuring the fairness of the segmentation model. We perform the first study to determine the impact of population shift with respect to age and sex on abdominal CT image segmentation, by leveraging two large public datasets, and introduce a novel metric to quantify the impact. We find that population shift is a challenge similar in magnitude to cross-dataset shift for abdominal organ segmentation, and that the effect is asymmetric and dataset-dependent. We conclude that dataset diversity in terms of known patient characteristics is not necessarily equivalent to dataset diversity in terms of image features. This implies that simple population matching to ensure good generalisation and fairness may be insufficient, and we recommend that fairness research should be directed towards better understanding and quantifying medical image dataset diversity in terms of performance-relevant characteristics such as organ morphology.
Paper Structure (21 sections, 1 equation, 2 figures, 5 tables)

This paper contains 21 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Causal diagram illustrating major factors that can influence medical image appearance and associated segmentation. The factors can be split into three broad groups: patient characteristics which directly influence patient anatomy, acquisition conditions which influence image appearance, and annotation protocol which influences manual segmentation style.
  • Figure 2: Plots of segmentation performance in terms of Dice score on the test set against the proxy measure of training set diversity, the standard deviation of organ volumes. The test set data has been split by colour-coded subgroups. The top row reports results on the TotalSegmentator dataset (TS) and the bottom row reports results on AMOS.