Table of Contents
Fetching ...

DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models

Zihao Li, Ruixiang Tang, Lu Cheng, Shuaiqiang Wang, Dawei Yin, Mengnan Du

TL;DR

The paper tackles shortcut learning in pre-trained language models for natural language understanding, which harms out-of-domain generalization. It introduces Divergence-Based Regularization (DBR), a transparent debiasing framework that first identifies shortcut features and then applies a regularization loss to align predictions between original and unbiased inputs, using both hard and soft masking strategies. Empirical results across MNLI, FEVER, and QQP show improved OOD performance with little or no sacrifice to in-domain accuracy, supported by analyses of bias tokens, convergence dynamics, and confidence distributions. The work offers a practical and interpretable approach to reduce reliance on superficial cues, with potential extensions to large language models and prompting regimes.

Abstract

Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks. However, recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language, especially for natural language understanding (NLU) tasks. Consequently, the models struggle to generalize to out-of-domain data. In this work, we propose Divergence Based Regularization (DBR) to mitigate this shortcut learning behavior. Our method measures the divergence between the output distributions for original examples and examples where shortcut tokens have been masked. This process prevents the model's predictions from being overly influenced by shortcut features or biases. We evaluate our model on three NLU tasks and find that it improves out-of-domain performance with little loss of in-domain accuracy. Our results demonstrate that reducing the reliance on shortcuts and superficial features can enhance the generalization ability of large pre-trained language models.

DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models

TL;DR

The paper tackles shortcut learning in pre-trained language models for natural language understanding, which harms out-of-domain generalization. It introduces Divergence-Based Regularization (DBR), a transparent debiasing framework that first identifies shortcut features and then applies a regularization loss to align predictions between original and unbiased inputs, using both hard and soft masking strategies. Empirical results across MNLI, FEVER, and QQP show improved OOD performance with little or no sacrifice to in-domain accuracy, supported by analyses of bias tokens, convergence dynamics, and confidence distributions. The work offers a practical and interpretable approach to reduce reliance on superficial cues, with potential extensions to large language models and prompting regimes.

Abstract

Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks. However, recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language, especially for natural language understanding (NLU) tasks. Consequently, the models struggle to generalize to out-of-domain data. In this work, we propose Divergence Based Regularization (DBR) to mitigate this shortcut learning behavior. Our method measures the divergence between the output distributions for original examples and examples where shortcut tokens have been masked. This process prevents the model's predictions from being overly influenced by shortcut features or biases. We evaluate our model on three NLU tasks and find that it improves out-of-domain performance with little loss of in-domain accuracy. Our results demonstrate that reducing the reliance on shortcuts and superficial features can enhance the generalization ability of large pre-trained language models.

Paper Structure

This paper contains 20 sections, 6 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: The proposed DBR framework. We first train a shortcut identification model to compute the shortcut degree of each sample, then use the regularization loss based on the JSD divergence to train the debiased model.
  • Figure 2: Attribution result visualization, the first and second row denote the attribution of each word before mitigation and after mitigation respectively. Words marked in green represent that the word contributes to the model prediction results, and the darker the color, the greater the contribution.
  • Figure 3: Confidence distribution of the identification model and the debiased model. The orange denotes the identification model and the green denotes the debiased model.
  • Figure 4: Loss function curves for three training approaches during the training stage: standard training, DBR-hard mask and DBR-soft mask.