Table of Contents
Fetching ...

Explanation based Bias Decoupling Regularization for Natural Language Inference

Jianxiang Zang, Hui Liu

TL;DR

The paper tackles robustness in Natural Language Inference by shifting from sample-level debiasing to identifying biased parts within samples through human explanations. It introduces Explanation based Bias Decoupling Regularization (EBD-Reg), which uses three parallel supervision targets—Distinguishing, Decoupling, Aligning—and Adaptive Token-level Attention to emphasize keywords while suppressing biases, guided by explanations from the eSNLI dataset. The training objective combines main and auxiliary losses $\mathcal{L}=\mathcal{L}_{\text{main}}+\alpha\mathcal{L}_{ER}+\frac{\beta}{H}\sum_{h=1}^H(\mathcal{L}^h_{SA}+\mathcal{L}^h_{SI})$ with $\alpha=0.4$, $\beta=0.8$, $H=3$, and demonstrates improved out-of-distribution performance across SNLI, MNLI, RTE, QNLI, ANLI, SciTail, and HANS on multiple Transformer backbones. Empirical results show strong gains from combining EBD-Reg with ATA, with ablations highlighting the complementary roles of keyword-vs-bias distinction, self-attention guidance, and sub-inference alignment. The approach provides a practical, interpretable route to debias NLI models and suggests broader applicability to other reasoning tasks where human explanations can identify causal features.

Abstract

The robustness of Transformer-based Natural Language Inference encoders is frequently compromised as they tend to rely more on dataset biases than on the intended task-relevant features. Recent studies have attempted to mitigate this by reducing the weight of biased samples during the training process. However, these debiasing methods primarily focus on identifying which samples are biased without explicitly determining the biased components within each case. This limitation restricts those methods' capability in out-of-distribution inference. To address this issue, we aim to train models to adopt the logic humans use in explaining causality. We propose a simple, comprehensive, and interpretable method: Explanation based Bias Decoupling Regularization (EBD-Reg). EBD-Reg employs human explanations as criteria, guiding the encoder to establish a tripartite parallel supervision of Distinguishing, Decoupling and Aligning. This method enables encoders to identify and focus on keywords that represent the task-relevant features during inference, while discarding the residual elements acting as biases. Empirical evidence underscores that EBD-Reg effectively guides various Transformer-based encoders to decouple biases through a human-centric lens, significantly surpassing other methods in terms of out-of-distribution inference capabilities.

Explanation based Bias Decoupling Regularization for Natural Language Inference

TL;DR

The paper tackles robustness in Natural Language Inference by shifting from sample-level debiasing to identifying biased parts within samples through human explanations. It introduces Explanation based Bias Decoupling Regularization (EBD-Reg), which uses three parallel supervision targets—Distinguishing, Decoupling, Aligning—and Adaptive Token-level Attention to emphasize keywords while suppressing biases, guided by explanations from the eSNLI dataset. The training objective combines main and auxiliary losses with , , , and demonstrates improved out-of-distribution performance across SNLI, MNLI, RTE, QNLI, ANLI, SciTail, and HANS on multiple Transformer backbones. Empirical results show strong gains from combining EBD-Reg with ATA, with ablations highlighting the complementary roles of keyword-vs-bias distinction, self-attention guidance, and sub-inference alignment. The approach provides a practical, interpretable route to debias NLI models and suggests broader applicability to other reasoning tasks where human explanations can identify causal features.

Abstract

The robustness of Transformer-based Natural Language Inference encoders is frequently compromised as they tend to rely more on dataset biases than on the intended task-relevant features. Recent studies have attempted to mitigate this by reducing the weight of biased samples during the training process. However, these debiasing methods primarily focus on identifying which samples are biased without explicitly determining the biased components within each case. This limitation restricts those methods' capability in out-of-distribution inference. To address this issue, we aim to train models to adopt the logic humans use in explaining causality. We propose a simple, comprehensive, and interpretable method: Explanation based Bias Decoupling Regularization (EBD-Reg). EBD-Reg employs human explanations as criteria, guiding the encoder to establish a tripartite parallel supervision of Distinguishing, Decoupling and Aligning. This method enables encoders to identify and focus on keywords that represent the task-relevant features during inference, while discarding the residual elements acting as biases. Empirical evidence underscores that EBD-Reg effectively guides various Transformer-based encoders to decouple biases through a human-centric lens, significantly surpassing other methods in terms of out-of-distribution inference capabilities.
Paper Structure (19 sections, 10 equations, 5 figures, 4 tables)

This paper contains 19 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of EBD-Reg, which includes 3 targets: (a) Distinguishing; (b) Decoupling; (c) Aligning.
  • Figure 2: Evaluation results for the eSNLI test set. SwapSyn-WN: Replaces words with synonyms provided by WordNet; SwapSyn-EM: Uses GloVe embeddings to replace common words with synonyms.
  • Figure 3: Implementing ATA, EBD-Reg, and their combination on other Transformer-based encoders, where the accuracy is the average performance of SNLI (test), MNLI (mm), RTE (dev). The results are the average of eight random seeds, with the error represented as the standard deviation.
  • Figure 4: The effect of $\alpha$,$\beta$ ,$H$
  • Figure 5: Case study concerning supervised for self-attention. The heatmap reports the normalized attention values of the $\text{[CLS]}$ token and other tokens in sentences. The bar chart presents the average proportions of normalized attention allocated to each part of speech.