Table of Contents
Fetching ...

Does Data-Efficient Generalization Exacerbate Bias in Foundation Models?

Dilermando Queiroz, Anderson Carlos, Maíra Fatoretto, Luis Filipe Nakayama, André Anjos, Lilian Berton

TL;DR

The paper evaluates bias in retinal Foundation Models when fine-tuning with limited data on BRSET, comparing RetFound to a supervised baseline. It shows that self-supervised pretraining can mitigate bias compared to supervision, but data-efficient regimes can increase age-related disparities, highlighting a crucial trade-off between utility and fairness. The study introduces fairness metrics adapted from MedFair 2023 and demonstrates that stratified sampling and broader validation are needed for equitable deployment of foundation models in medical imaging. These findings have practical implications for deploying data-efficient medical AI in underrepresented populations and stress the importance of fairness considerations in model selection and data curation.

Abstract

Foundation models have emerged as robust models with label efficiency in diverse domains. In medical imaging, these models contribute to the advancement of medical diagnoses due to the difficulty in obtaining labeled data. However, it is unclear whether using a large amount of unlabeled data, biased by the presence of sensitive attributes during pre-training, influences the fairness of the model. This research examines the bias in the Foundation model (RetFound) when it is applied to fine-tune the Brazilian Multilabel Ophthalmological Dataset (BRSET), which has a different population than the pre-training dataset. The model evaluation, in comparison with supervised learning, shows that the Foundation Model has the potential to reduce the gap between the maximum AUC and minimum AUC evaluations across gender and age groups. However, in a data-efficient generalization, the model increases the bias when the data amount decreases. These findings suggest that when deploying a Foundation Model in real-life scenarios with limited data, the possibility of fairness issues should be considered.

Does Data-Efficient Generalization Exacerbate Bias in Foundation Models?

TL;DR

The paper evaluates bias in retinal Foundation Models when fine-tuning with limited data on BRSET, comparing RetFound to a supervised baseline. It shows that self-supervised pretraining can mitigate bias compared to supervision, but data-efficient regimes can increase age-related disparities, highlighting a crucial trade-off between utility and fairness. The study introduces fairness metrics adapted from MedFair 2023 and demonstrates that stratified sampling and broader validation are needed for equitable deployment of foundation models in medical imaging. These findings have practical implications for deploying data-efficient medical AI in underrepresented populations and stress the importance of fairness considerations in model selection and data curation.

Abstract

Foundation models have emerged as robust models with label efficiency in diverse domains. In medical imaging, these models contribute to the advancement of medical diagnoses due to the difficulty in obtaining labeled data. However, it is unclear whether using a large amount of unlabeled data, biased by the presence of sensitive attributes during pre-training, influences the fairness of the model. This research examines the bias in the Foundation model (RetFound) when it is applied to fine-tune the Brazilian Multilabel Ophthalmological Dataset (BRSET), which has a different population than the pre-training dataset. The model evaluation, in comparison with supervised learning, shows that the Foundation Model has the potential to reduce the gap between the maximum AUC and minimum AUC evaluations across gender and age groups. However, in a data-efficient generalization, the model increases the bias when the data amount decreases. These findings suggest that when deploying a Foundation Model in real-life scenarios with limited data, the possibility of fairness issues should be considered.
Paper Structure (8 sections, 2 figures, 1 table)

This paper contains 8 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: This histogram displays the age distribution from the pre-processed BRSET dataset, categorizing patients into four age groups: 0-25, 26-50, 51-75, and 76-100 years. This categorization facilitates the evaluation of the max-min metrics presented in \ref{['sec:experiments']}.
  • Figure 2: The scatter plot delineates the relationship between minimum and maximum AUC values for two models: ViT-L with RetFound weights and a baseline model developed from scratch. Point size differentiates the models, while colors and distinct markers denote data percentages and sensitive attributes, respectively. A reference grey line enables performance comparison, succinctly encapsulating AUC dynamics and attribute interactions.