Table of Contents
Fetching ...

Iterative Counterfactual Data Augmentation

Mitchell Plyler, Min Chi

TL;DR

Iterative Counterfactual Data Augmentation (ICDA) addresses spurious signals in NLP datasets by using rationale networks to identify the principal signal $X_1$ and suppress the spurious $X_2$, guided by the maximum mutual information $I(X_M;Y)$. ICDA iteratively generates counterfactuals, retrains the rationale selector on augmented data to reduce the selector error rate $\alpha$, and applies a fixed-point convergence argument to justify termination. Across eight datasets (six human-generated and two LLM-generated), ICDA improves rationale precision and outperforms MMI, CDA, and related baselines. The method is offline and computationally intensive but offers bias control without additional human annotation, with implications for fairness and interpretability in NLP.

Abstract

Counterfactual data augmentation (CDA) is a method for controlling information or biases in training datasets by generating a complementary dataset with typically opposing biases. Prior work often either relies on hand-crafted rules or algorithmic CDA methods which can leave unwanted information in the augmented dataset. In this work, we show iterative CDA (ICDA) with initial, high-noise interventions can converge to a state with significantly lower noise. Our ICDA procedure produces a dataset where one target signal in the training dataset maintains high mutual information with a corresponding label and the information of spurious signals are reduced. We show training on the augmented datasets produces rationales on documents that better align with human annotation. Our experiments include six human produced datasets and two large-language model generated datasets.

Iterative Counterfactual Data Augmentation

TL;DR

Iterative Counterfactual Data Augmentation (ICDA) addresses spurious signals in NLP datasets by using rationale networks to identify the principal signal and suppress the spurious , guided by the maximum mutual information . ICDA iteratively generates counterfactuals, retrains the rationale selector on augmented data to reduce the selector error rate , and applies a fixed-point convergence argument to justify termination. Across eight datasets (six human-generated and two LLM-generated), ICDA improves rationale precision and outperforms MMI, CDA, and related baselines. The method is offline and computationally intensive but offers bias control without additional human annotation, with implications for fairness and interpretability in NLP.

Abstract

Counterfactual data augmentation (CDA) is a method for controlling information or biases in training datasets by generating a complementary dataset with typically opposing biases. Prior work often either relies on hand-crafted rules or algorithmic CDA methods which can leave unwanted information in the augmented dataset. In this work, we show iterative CDA (ICDA) with initial, high-noise interventions can converge to a state with significantly lower noise. Our ICDA procedure produces a dataset where one target signal in the training dataset maintains high mutual information with a corresponding label and the information of spurious signals are reduced. We show training on the augmented datasets produces rationales on documents that better align with human annotation. Our experiments include six human produced datasets and two large-language model generated datasets.

Paper Structure

This paper contains 23 sections, 1 theorem, 17 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Process eq:iter_alpha converges to expected error $\alpha^{k+1} = R(\delta^a=I(X_1,Y_1),n)$ under Algorithm alg:iter_cda for rationale scheme def:binary_rat, an $n$ such that $R^{-1} < J$ for some $\alpha \in [R(\delta^a=I(X_1,Y_1),n),\alpha_T)$, and an initial $\alpha^{k=0}$ such that $R(\delta^a=

Figures (4)

  • Figure 1: Increasing $\beta$ helps CDA.
  • Figure 2: (a) Definition of operator $J$. (b) MC Definition of operator $R$ for $P(Y_1|X_1)=.95$. (c) Definition of fixed-point process \ref{['eq:iter_alpha']} from (a) and (b). (d) shows convergence in simulation for an initial point $\alpha^{k=0}=.27$ determined by $J$ and initial conditions $\theta$.
  • Figure 3: Network configuration for rationalization over sentences with a hierarchical transformer.
  • Figure 4: Convergence results over all experiments runs. Note that the general trend for the reationale error to decrease with CDA iterations. Rationale error $\alpha$ is taken as the complement of the rationale precision.

Theorems & Definitions (2)

  • Definition 1
  • Theorem 1