Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

Yasin I. Tepeli; Mathijs de Wolf; Joana P. Gonçalves

Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

Yasin I. Tepeli, Mathijs de Wolf, Joana P. Gonçalves

TL;DR

This work addresses selection bias in machine learning by introducing Metric-DST, a diversity-guided self-training framework built on metric learning to create a diverse, class-aware embedding space for pseudo-labeling unlabeled data. By sampling diverse regions of the embedding space rather than maximizing confidence, Metric-DST mitigates confirmation bias and improves generalization under biased training data. Across generated, real-world, and synthetic lethality datasets with varied bias scenarios, Metric-DST often matches or surpasses supervised performance and generally outperforms conventional self-training, especially when labeled data are scarce. The approach is flexible, classifier-agnostic, and broadly applicable for fairness-aware predictions in the presence of selection bias.

Abstract

Selection bias poses a critical challenge for fairness in machine learning, as models trained on data that is less representative of the population might exhibit undesirable behavior for underrepresented profiles. Semi-supervised learning strategies like self-training can mitigate selection bias by incorporating unlabeled data into model training to gain further insight into the distribution of the population. However, conventional self-training seeks to include high-confidence data samples, which may reinforce existing model bias and compromise effectiveness. We propose Metric-DST, a diversity-guided self-training strategy that leverages metric learning and its implicit embedding space to counter confidence-based bias through the inclusion of more diverse samples. Metric-DST learned more robust models in the presence of selection bias for generated and real-world datasets with induced bias, as well as a molecular biology prediction task with intrinsic bias. The Metric-DST learning strategy offers a flexible and widely applicable solution to mitigate selection bias and enhance fairness of machine learning models.

Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

TL;DR

Abstract

Metric-DST: Mitigating Selection Bias Through Diversity-Guided Semi-Supervised Metric Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)