Table of Contents
Fetching ...

Positive-Sum Fairness: Leveraging Demographic Attributes to Achieve Fair AI Outcomes Without Sacrificing Group Gains

Samia Belhadj, Sanguk Park, Ambika Seth, Hesham Dar, Thijs Kooi

TL;DR

The notion of positive-sum fairness is introduced, which states that an increase in performance that results in a larger group disparity is acceptable as long as it does not come at the cost of individual subgroup performance.

Abstract

Fairness in medical AI is increasingly recognized as a crucial aspect of healthcare delivery. While most of the prior work done on fairness emphasizes the importance of equal performance, we argue that decreases in fairness can be either harmful or non-harmful, depending on the type of change and how sensitive attributes are used. To this end, we introduce the notion of positive-sum fairness, which states that an increase in performance that results in a larger group disparity is acceptable as long as it does not come at the cost of individual subgroup performance. This allows sensitive attributes correlated with the disease to be used to increase performance without compromising on fairness. We illustrate this idea by comparing four CNN models that make different use of the race attribute in the training phase. The results show that removing all demographic encodings from the images helps close the gap in performance between the different subgroups, whereas leveraging the race attribute as a model's input increases the overall performance while widening the disparities between subgroups. These larger gaps are then put in perspective of the collective benefit through our notion of positive-sum fairness to distinguish harmful from non harmful disparities.

Positive-Sum Fairness: Leveraging Demographic Attributes to Achieve Fair AI Outcomes Without Sacrificing Group Gains

TL;DR

The notion of positive-sum fairness is introduced, which states that an increase in performance that results in a larger group disparity is acceptable as long as it does not come at the cost of individual subgroup performance.

Abstract

Fairness in medical AI is increasingly recognized as a crucial aspect of healthcare delivery. While most of the prior work done on fairness emphasizes the importance of equal performance, we argue that decreases in fairness can be either harmful or non-harmful, depending on the type of change and how sensitive attributes are used. To this end, we introduce the notion of positive-sum fairness, which states that an increase in performance that results in a larger group disparity is acceptable as long as it does not come at the cost of individual subgroup performance. This allows sensitive attributes correlated with the disease to be used to increase performance without compromising on fairness. We illustrate this idea by comparing four CNN models that make different use of the race attribute in the training phase. The results show that removing all demographic encodings from the images helps close the gap in performance between the different subgroups, whereas leveraging the race attribute as a model's input increases the overall performance while widening the disparities between subgroups. These larger gaps are then put in perspective of the collective benefit through our notion of positive-sum fairness to distinguish harmful from non harmful disparities.
Paper Structure (15 sections, 3 figures)

This paper contains 15 sections, 3 figures.

Figures (3)

  • Figure 1: We investigate fairness of AI models and introduce the concept of 'positive-sum fairness' to differentiate harmful and non-harmful disparities. Graph a) shows the performance of an initial model per protected groups. b) shows the performance of an updated model with a higher overall performance but a lower fairness, under its standard definition, as indicated by the larger difference between the most and least advantaged groups and therefore could be rejected on the basis of fairness. c) shows the same updated model as b) however it shows the performance difference per group compared to the initial model. In this positive-sum framing we see that none of the groups had a reduction in performance and therefore the increased performance in Race C did not come at the cost of performance in any other group.
  • Figure 2: To investigate the effect of sensitive attributes on performance and fairness, we evaluate four different model architectures, denoted M1, M2, M3 and M4. M1, the baseline, has a backbone and classification. M2 has a race encoding branch to learn race-encoded features directly from metadata. M3 and M4 have an additional race branch to predict the race group which is implicitly encoded in the image, from the image features. The difference between M3 and M4 is that we add a gradient reversal layer before the race branch.
  • Figure 3: We put in parallel 2 different fairness vs performance frameworks: in figure (a), we compute both the performance (AUROC) and fairness (as 1 - the difference in AUROC between the most and least advantaged groups) of the 4 models per lesion. And in figure (b), we show, the difference in overall performance and in performance per protected subgroup between the 3 improved classifiers and the baseline M1. The x axis compares the performance of each improved classifier with the baseline and the y axis shows whether at least one protected subgroup has been harmed by the modifications brought to the baseline classifier.