Table of Contents
Fetching ...

Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data

Dilermando Queiroz, André Anjos, Lilian Berton

TL;DR

This work addresses fairness evaluation in chest radiography when demographic attributes are unavailable by leveraging the backbone of a Foundation Model as an embedding extractor to form proxy groups for protected attributes. It reduces high-dimensional image embeddings with t-SNE and discovers clusters via DBSCAN to create balanced, demographic-free subsets for fairness assessment across in-distribution CheXpert and out-of-distribution NIH data. The approach yields substantial reductions in gender disparity—approximately $4.44$ percentage points in ID and $6.16$ points in OOD—while age-related fairness remains less robust, highlighting the need for more robust Foundation Models. Overall, the framework enables fairness analysis and mitigation without explicit demographic data, supporting more equitable medical diagnostics and guiding future FM development.

Abstract

Ensuring consistent performance across diverse populations and incorporating fairness into machine learning models are crucial for advancing medical image diagnostics and promoting equitable healthcare. However, many databases do not provide protected attributes or contain unbalanced representations of demographic groups, complicating the evaluation of model performance across different demographics and the application of bias mitigation techniques that rely on these attributes. This study aims to investigate the effectiveness of using the backbone of Foundation Models as an embedding extractor for creating groups that represent protected attributes, such as gender and age. We propose utilizing these groups in different stages of bias mitigation, including pre-processing, in-processing, and evaluation. Using databases in and out-of-distribution scenarios, it is possible to identify that the method can create groups that represent gender in both databases and reduce in 4.44% the difference between the gender attribute in-distribution and 6.16% in out-of-distribution. However, the model lacks robustness in handling age attributes, underscoring the need for more fundamentally fair and robust Foundation models. These findings suggest a role in promoting fairness assessment in scenarios where we lack knowledge of attributes, contributing to the development of more equitable medical diagnostics.

Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data

TL;DR

This work addresses fairness evaluation in chest radiography when demographic attributes are unavailable by leveraging the backbone of a Foundation Model as an embedding extractor to form proxy groups for protected attributes. It reduces high-dimensional image embeddings with t-SNE and discovers clusters via DBSCAN to create balanced, demographic-free subsets for fairness assessment across in-distribution CheXpert and out-of-distribution NIH data. The approach yields substantial reductions in gender disparity—approximately percentage points in ID and points in OOD—while age-related fairness remains less robust, highlighting the need for more robust Foundation Models. Overall, the framework enables fairness analysis and mitigation without explicit demographic data, supporting more equitable medical diagnostics and guiding future FM development.

Abstract

Ensuring consistent performance across diverse populations and incorporating fairness into machine learning models are crucial for advancing medical image diagnostics and promoting equitable healthcare. However, many databases do not provide protected attributes or contain unbalanced representations of demographic groups, complicating the evaluation of model performance across different demographics and the application of bias mitigation techniques that rely on these attributes. This study aims to investigate the effectiveness of using the backbone of Foundation Models as an embedding extractor for creating groups that represent protected attributes, such as gender and age. We propose utilizing these groups in different stages of bias mitigation, including pre-processing, in-processing, and evaluation. Using databases in and out-of-distribution scenarios, it is possible to identify that the method can create groups that represent gender in both databases and reduce in 4.44% the difference between the gender attribute in-distribution and 6.16% in out-of-distribution. However, the model lacks robustness in handling age attributes, underscoring the need for more fundamentally fair and robust Foundation models. These findings suggest a role in promoting fairness assessment in scenarios where we lack knowledge of attributes, contributing to the development of more equitable medical diagnostics.
Paper Structure (6 sections, 2 figures)

This paper contains 6 sections, 2 figures.

Figures (2)

  • Figure 1: (a) Overview of the application of groups formed by the proposed method in various contexts such as model processing, subset selection, and metric evaluation. (b) The process begins with a Foundation Model (FM), trained on a large corpus of chest X-ray images, to extract embeddings from a dataset devoid of sensitive attributes. These embeddings are then subjected to dimensionality reduction via t-SNE maaten_visualizing_2008, facilitating clustering in a lower-dimensional space and enhancing computational efficiency. Subsequently, DBSCAN ester_density-based_1996 is applied to identify clusters that will be used to form a notion of groups. (c) Visualization of embeddings, which were subsequently reduced to two dimensions using t-SNE. These dimensions are denoted by patient age and gender, spanning across the CheXpert (in-distribution) and NIH (out-of-distribution) databases.
  • Figure 2: (a) Age distribution across clusters in the CheXpert (in-distribution) and NIH databases (out-of-distribution). The cluster $-1$ represents the unclustered data; other numbers are the clusters. (b) Gender Distribution Across Clusters for CheXpert and NIH Datasets. (c) Gender Distribution after sampling 30% of CheXpert and NIH Datasets. (d) Kernel density estimate (KDE) plots illustrate the comparison of age distributions between the CheXpert and NIH subsets, utilizing both random and cluster sampling techniques, with data categorized by patient gender.