Table of Contents
Fetching ...

FairFlow: Mitigating Dataset Biases through Undecided Learning

Jiali Cheng, Hadi Amiri

TL;DR

Dataset shortcuts cause NLP models to rely on biases that hurt generalization. FairFlow tackles this with undecided-learning, employing explicit and implicit multiview perturbations and a supervised contrastive objective to push biased predictions toward a uniform distribution ($U$) while preserving correct predictions on clean data. Empirical results across MNLI, QQP, and PGR demonstrate robust debiasing with strong stress-test and OOD transfer performance, often surpassing baselines without sacrificing in-domain accuracy, and with minimal additional parameters. These findings advance bias mitigation by enabling multiview, architecture-agnostic debiasing, though the authors acknowledge remaining biases and suggest directions for further improvement.

Abstract

Language models are prone to dataset biases, known as shortcuts and spurious correlations in data, which often result in performance drop on new data. We present a new debiasing framework called ``FairFlow'' that mitigates dataset biases by learning to be undecided in its predictions for data samples or representations associated with known or unknown biases. The framework introduces two key components: a suite of data and model perturbation operations that generate different biased views of input samples, and a contrastive objective that learns debiased and robust representations from the resulting biased views of samples. Experiments show that FairFlow outperforms existing debiasing methods, particularly against out-of-domain and hard test samples without compromising the in-domain performance

FairFlow: Mitigating Dataset Biases through Undecided Learning

TL;DR

Dataset shortcuts cause NLP models to rely on biases that hurt generalization. FairFlow tackles this with undecided-learning, employing explicit and implicit multiview perturbations and a supervised contrastive objective to push biased predictions toward a uniform distribution () while preserving correct predictions on clean data. Empirical results across MNLI, QQP, and PGR demonstrate robust debiasing with strong stress-test and OOD transfer performance, often surpassing baselines without sacrificing in-domain accuracy, and with minimal additional parameters. These findings advance bias mitigation by enabling multiview, architecture-agnostic debiasing, though the authors acknowledge remaining biases and suggest directions for further improvement.

Abstract

Language models are prone to dataset biases, known as shortcuts and spurious correlations in data, which often result in performance drop on new data. We present a new debiasing framework called ``FairFlow'' that mitigates dataset biases by learning to be undecided in its predictions for data samples or representations associated with known or unknown biases. The framework introduces two key components: a suite of data and model perturbation operations that generate different biased views of input samples, and a contrastive objective that learns debiased and robust representations from the resulting biased views of samples. Experiments show that FairFlow outperforms existing debiasing methods, particularly against out-of-domain and hard test samples without compromising the in-domain performance

Paper Structure

This paper contains 39 sections, 6 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: An example highlighting the concept of "undecided learning" using two types of data perturbation techniques. Given a premise-hypothesis pair in NLI, the model is expected to correctly classify their entailment relationship. However, given only the hypothesis, a robust model should be undecided, i.e., refrain from making a definite judgment about the relationship between an unknown premise and the given hypothesis. Similarly, given a severely corrupted representation, a robust model should be undecided about the relation between a corrupted premise and hypothesis pair. Models that retain confidence in assigning labels to such inputs are likely to rely on shortcuts. FairFlow takes an undecided stance against such inputs.
  • Figure 2: Architecture of the proposed model. (a) Explicit and implicit perturbations are applied to inputs to obtain biased prediction $z_{\mathrm{Biased}}$. (b) Biased predictions are drawn closer to uniform distribution, while predictions for intact input are pushed away from uniform distribution through contrastive learning.
  • Figure 3: Debiasing performance with different combinations of explicit and implicit perturbations. The values indicate relative accuracy increase compared to vanilla fine-tuning.