Table of Contents
Fetching ...

CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples

Kyohoon Jin, Juhwan Choi, Jungmin Yun, Junho Lee, Soojin Jang, Youngbin Kim

TL;DR

This work tackles spurious correlations in NLP by introducing CoBA, a counterbias data augmentation framework that operates at the semantic-triple level. It decomposes text into subject-predicate-object triples, identifies principal and spurious words via a majority-voting ensemble, and performs triple-level manipulation before reconstructing text with an LLM to generate diverse counterbias data. Empirically, CoBA improves in-distribution task performance, mitigates various biases (including gender bias), and enhances out-of-distribution robustness across sentiment analysis, natural language inference, and generation tasks, outperforming conventional counterfactual augmentation methods. The approach offers a versatile and scalable way to curb spurious correlations while expanding data diversity and understanding model behavior through semantic triples. Limitations include potential information loss in triples, bias scope, and reliance on ensemble strategies, with future work aiming to refine decomposition fidelity and broaden applicability.

Abstract

Deep learning models often learn and exploit spurious correlations in training data, using these non-target features to inform their predictions. Such reliance leads to performance degradation and poor generalization on unseen data. To address these limitations, we introduce a more general form of counterfactual data augmentation, termed counterbias data augmentation, which simultaneously tackles multiple biases (e.g., gender bias, simplicity bias) and enhances out-of-distribution robustness. We present CoBA: CounterBias Augmentation, a unified framework that operates at the semantic triple level: first decomposing text into subject-predicate-object triples, then selectively modifying these triples to disrupt spurious correlations. By reconstructing the text from these adjusted triples, CoBA generates counterbias data that mitigates spurious patterns. Through extensive experiments, we demonstrate that CoBA not only improves downstream task performance, but also effectively reduces biases and strengthens out-of-distribution resilience, offering a versatile and robust solution to the challenges posed by spurious correlations.

CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples

TL;DR

This work tackles spurious correlations in NLP by introducing CoBA, a counterbias data augmentation framework that operates at the semantic-triple level. It decomposes text into subject-predicate-object triples, identifies principal and spurious words via a majority-voting ensemble, and performs triple-level manipulation before reconstructing text with an LLM to generate diverse counterbias data. Empirically, CoBA improves in-distribution task performance, mitigates various biases (including gender bias), and enhances out-of-distribution robustness across sentiment analysis, natural language inference, and generation tasks, outperforming conventional counterfactual augmentation methods. The approach offers a versatile and scalable way to curb spurious correlations while expanding data diversity and understanding model behavior through semantic triples. Limitations include potential information loss in triples, bias scope, and reliance on ensemble strategies, with future work aiming to refine decomposition fidelity and broaden applicability.

Abstract

Deep learning models often learn and exploit spurious correlations in training data, using these non-target features to inform their predictions. Such reliance leads to performance degradation and poor generalization on unseen data. To address these limitations, we introduce a more general form of counterfactual data augmentation, termed counterbias data augmentation, which simultaneously tackles multiple biases (e.g., gender bias, simplicity bias) and enhances out-of-distribution robustness. We present CoBA: CounterBias Augmentation, a unified framework that operates at the semantic triple level: first decomposing text into subject-predicate-object triples, then selectively modifying these triples to disrupt spurious correlations. By reconstructing the text from these adjusted triples, CoBA generates counterbias data that mitigates spurious patterns. Through extensive experiments, we demonstrate that CoBA not only improves downstream task performance, but also effectively reduces biases and strengthens out-of-distribution resilience, offering a versatile and robust solution to the challenges posed by spurious correlations.

Paper Structure

This paper contains 40 sections, 1 figure, 16 tables.

Figures (1)

  • Figure 1: Overall procedure of CoBA.