Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data
Dilermando Queiroz, André Anjos, Lilian Berton
TL;DR
This work addresses fairness evaluation in chest radiography when demographic attributes are unavailable by leveraging the backbone of a Foundation Model as an embedding extractor to form proxy groups for protected attributes. It reduces high-dimensional image embeddings with t-SNE and discovers clusters via DBSCAN to create balanced, demographic-free subsets for fairness assessment across in-distribution CheXpert and out-of-distribution NIH data. The approach yields substantial reductions in gender disparity—approximately $4.44$ percentage points in ID and $6.16$ points in OOD—while age-related fairness remains less robust, highlighting the need for more robust Foundation Models. Overall, the framework enables fairness analysis and mitigation without explicit demographic data, supporting more equitable medical diagnostics and guiding future FM development.
Abstract
Ensuring consistent performance across diverse populations and incorporating fairness into machine learning models are crucial for advancing medical image diagnostics and promoting equitable healthcare. However, many databases do not provide protected attributes or contain unbalanced representations of demographic groups, complicating the evaluation of model performance across different demographics and the application of bias mitigation techniques that rely on these attributes. This study aims to investigate the effectiveness of using the backbone of Foundation Models as an embedding extractor for creating groups that represent protected attributes, such as gender and age. We propose utilizing these groups in different stages of bias mitigation, including pre-processing, in-processing, and evaluation. Using databases in and out-of-distribution scenarios, it is possible to identify that the method can create groups that represent gender in both databases and reduce in 4.44% the difference between the gender attribute in-distribution and 6.16% in out-of-distribution. However, the model lacks robustness in handling age attributes, underscoring the need for more fundamentally fair and robust Foundation models. These findings suggest a role in promoting fairness assessment in scenarios where we lack knowledge of attributes, contributing to the development of more equitable medical diagnostics.
