To which reference class do you belong? Measuring racial fairness of reference classes with normative modeling
Saige Rutherford, Thomas Wolfers, Charlotte Fraza, Nathaniel G. Harnett, Christian F. Beckmann, Henricus G. Ruhe, Andre F. Marquand
TL;DR
This paper investigates how the racial composition of reference classes used in normative modeling affects the interpretation of deviations in brain structure. By comparing pre-trained, race-not-included, and race-included normative models on two large neuroimaging cohorts (HCP and UKB), the authors quantify racial biases in deviation scores and residuals and demonstrate that race can be predicted from model features with high accuracy. They reveal persistent racial disparities even when race is included as a predictor, highlighting that deviations may reflect demographic mismatch with the reference class rather than true pathology. The work emphasizes the urgency of collecting more representative, granular data and promotes transparent reporting to responsibly translate normative-model deviations into clinical meaning and health equity gains.
Abstract
Reference classes in healthcare establish healthy norms, such as pediatric growth charts of height and weight, and are used to chart deviations from these norms which represent potential clinical risk. How the demographics of the reference class influence clinical interpretation of deviations is unknown. Using normative modeling, a method for building reference classes, we evaluate the fairness (racial bias) in reference models of structural brain images that are widely used in psychiatry and neurology. We test whether including race in the model creates fairer models. We predict self-reported race using the deviation scores from three different reference class normative models, to better understand bias in an integrated, multivariate sense. Across all of these tasks, we uncover racial disparities that are not easily addressed with existing data or commonly used modeling techniques. Our work suggests that deviations from the norm could be due to demographic mismatch with the reference class, and assigning clinical meaning to these deviations should be done with caution. Our approach also suggests that acquiring more representative samples is an urgent research priority.
