Are demographically invariant models and representations in medical imaging fair?

Eike Petersen; Enzo Ferrante; Melanie Ganz; Aasa Feragen

Are demographically invariant models and representations in medical imaging fair?

Eike Petersen, Enzo Ferrante, Melanie Ganz, Aasa Feragen

TL;DR

This paper questions whether enforcing demographically invariant representations is desirable in medical imaging. It analyzes marginal invariance (leading to statistical parity) and class-conditional invariance (leading to separation/equalized odds), along with counterfactual invariance, highlighting their trade-offs, particularly when disease prevalence differs across groups. The authors argue that both invariance types can harm predictive performance and calibration and may not guarantee fair treatment, while counterfactual approaches face substantial definitional challenges in medical imaging. They conclude that encoding demographic attributes is not inherently unfair and can even be advantageous for learning task-relevant, physiology-based encodings, urging comprehensive subgroup fairness assessments and practical mitigation strategies rather than strict invariance.

Abstract

Medical imaging models have been shown to encode information about patient demographics such as age, race, and sex in their latent representation, raising concerns about their potential for discrimination. Here, we ask whether requiring models not to encode demographic attributes is desirable. We point out that marginal and class-conditional representation invariance imply the standard group fairness notions of demographic parity and equalized odds, respectively. In addition, however, they require matching the risk distributions, thus potentially equalizing away important group differences. Enforcing the traditional fairness notions directly instead does not entail these strong constraints. Moreover, representationally invariant models may still take demographic attributes into account for deriving predictions, implying unequal treatment - in fact, achieving representation invariance may require doing so. In theory, this can be prevented using counterfactual notions of (individual) fairness or invariance. We caution, however, that properly defining medical image counterfactuals with respect to demographic attributes is fraught with challenges. Finally, we posit that encoding demographic attributes may even be advantageous if it enables learning a task-specific encoding of demographic features that does not rely on social constructs such as 'race' and 'gender.' We conclude that demographically invariant representations are neither necessary nor sufficient for fairness in medical imaging. Models may need to encode demographic attributes, lending further urgency to calls for comprehensive model fairness assessments in terms of predictive performance across diverse patient groups.

Are demographically invariant models and representations in medical imaging fair?

TL;DR

Abstract

Paper Structure (7 sections, 2 equations, 4 figures)

This paper contains 7 sections, 2 equations, 4 figures.

Introduction
Problem setting and notation
Marginal representation invariance
Class-conditional representation invariance
General drawbacks of representation invariance
Model invariance
Discussion & Conclusion

Figures (4)

Figure 1: An illustration of the effects of enforcing marginal and class-conditional representation invariance in the case of a disease distribution with prevalence differences between two groups ($a_1$ and $a_2$). The shown example is an illustrative case in which perfect classification is possible. Representation invariance implies risk distribution and classification rate invariances due to the deterministic relationships between the latent representation, risk predictions, and binary classifications. Note the enforced misclassification (shaded) in the case of marginal invariance, and the enforced identity across groups of the marginal and class-conditional predicted risk distributions, respectively. While class-conditional representation invariance solves the illustrated issue due to prevalence differences, it still has drawbacks: it is incompatible with group-wise model calibration (see \ref{['sec:conditional-invariance']}) and requires equalizing away potentially important differences between unknown disease subtypes (see \ref{['sec:general-drawbacks']} and \ref{['fig:intra-class']}).
Figure 2: Accuracy with which binary group membership $A$ can be predicted just from the target label $Y$, fully ignoring any input data $X$, as a function of the dimensionality of $Y$. The label $Y$ is assumed binary-valued here, representing, e.g., multiple binary disease labels. Incidences per disease (i.e., elements of $Y$) are drawn randomly from $[0.1, 0.9]$ for the two groups; mean and standard deviation of the resulting prediction accuracy over 1000 repetitions shown. (In the case of identical prevalences, a label does not increase the identifiability of $A$.) Note that if $A$ is identifiable from $Y$, the same is true for any accurate model predictions $\hat{Y}$. Illustrates why asking for groups to be non-identifiable in the presence of prevalence differences is ill-advised, and how this problem is aggravated for higher-dimensional $Y$, such as in multi-class settings or segmentation.
Figure 3: An illustration of the effects of enforcing (class-conditional) representation invariance in the case of differences in the within-class distributions between two groups ($a_1$ and $a_2$). The distribution of the (unlabeled/unobserved) disease severity may differ between sick patients of the two groups (left panel). Enforcing (class-conditionally) identical latent representations $Z$ (right panel) requires mapping patients with differing disease severities to the same latent representation, depending on which group a patient belongs to (center). This will also result in them being assigned the same risk predictions, see \ref{['sec:conditional-invariance']}.
Figure 4: A simplistic causal diagram illustrating some of the many ways in which biological (birth) sex may causally influence medical image recordings $x$. With respect to which of the causal paths from "Birth Sex" to $x$ should a disease classification model be (counterfactually) invariant?

Are demographically invariant models and representations in medical imaging fair?

TL;DR

Abstract

Are demographically invariant models and representations in medical imaging fair?

Authors

TL;DR

Abstract

Table of Contents

Figures (4)