Table of Contents
Fetching ...

Evaluating Fairness in Self-supervised and Supervised Models for Sequential Data

Sofia Yfantidou, Dimitris Spathis, Marios Constantinides, Athena Vakali, Daniele Quercia, Fahim Kawsar

TL;DR

The paper addresses fairness gaps in time-series healthcare modeling by comparing self-supervised and supervised learning across many models and fine-tuning strategies on the MIMIC-III dataset. It uses a SimCLR-style SSL framework, progressive layer freezing, and CKA to assess learned representations, evaluating with AUC-ROC and the Error Rate Ratio (ERR). The key finding is that SSL can match supervised performance while delivering up to a 27% improvement in fairness, depending on fine-tuning, with representation differences across demographic groups contributing to these effects. This work highlights SSL’s potential for fairer, data-scarce, human-centric applications like critical care monitoring and mortality prediction.

Abstract

Self-supervised learning (SSL) has become the de facto training paradigm of large models where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Hypothesizing that SSL models would learn more generic, hence less biased, representations, this study explores the impact of pre-training and fine-tuning strategies on fairness (i.e., performing equally on different demographic breakdowns). Motivated by human-centric applications on real-world timeseries data, we interpret inductive biases on the model, layer, and metric levels by systematically comparing SSL models to their supervised counterparts. Our findings demonstrate that SSL has the capacity to achieve performance on par with supervised methods while significantly enhancing fairness--exhibiting up to a 27% increase in fairness with a mere 1% loss in performance through self-supervision. Ultimately, this work underscores SSL's potential in human-centric computing, particularly high-stakes, data-scarce application domains like healthcare.

Evaluating Fairness in Self-supervised and Supervised Models for Sequential Data

TL;DR

The paper addresses fairness gaps in time-series healthcare modeling by comparing self-supervised and supervised learning across many models and fine-tuning strategies on the MIMIC-III dataset. It uses a SimCLR-style SSL framework, progressive layer freezing, and CKA to assess learned representations, evaluating with AUC-ROC and the Error Rate Ratio (ERR). The key finding is that SSL can match supervised performance while delivering up to a 27% improvement in fairness, depending on fine-tuning, with representation differences across demographic groups contributing to these effects. This work highlights SSL’s potential for fairer, data-scarce, human-centric applications like critical care monitoring and mortality prediction.

Abstract

Self-supervised learning (SSL) has become the de facto training paradigm of large models where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Hypothesizing that SSL models would learn more generic, hence less biased, representations, this study explores the impact of pre-training and fine-tuning strategies on fairness (i.e., performing equally on different demographic breakdowns). Motivated by human-centric applications on real-world timeseries data, we interpret inductive biases on the model, layer, and metric levels by systematically comparing SSL models to their supervised counterparts. Our findings demonstrate that SSL has the capacity to achieve performance on par with supervised methods while significantly enhancing fairness--exhibiting up to a 27% increase in fairness with a mere 1% loss in performance through self-supervision. Ultimately, this work underscores SSL's potential in human-centric computing, particularly high-stakes, data-scarce application domains like healthcare.
Paper Structure (10 sections, 4 figures, 1 table)

This paper contains 10 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Fine-tuning strategies for SSL workflows. We experiment with various levels of supervision to optimize both performance and fairness.
  • Figure 2: AUC-ROC curves across models. Depending on the fine-tuning strategy, SSL models achieve comparable performance to their supervised counterparts.
  • Figure 3: Deviation from parity in error rate ratio across models. While the supervised model has superior performance, it has a greater deviation from parity (dashed line) in terms of error rate ratio compared to the best-performing SSL model (i.e., 1 $\bullet\circ\bullet$).
  • Figure 4: Representation similarity through CKA conditioned on gender between the supervised and the (best) SSL models. The random data subset is balanced to represent both genders equally. The similarity of the SSL and supervised models is higher for male than female users.