Table of Contents
Fetching ...

Explore Spurious Correlations at the Concept Level in Language Models for Text Classification

Yuhang Zhou, Paiheng Xu, Xiaoyu Liu, Bang An, Wei Ai, Furong Huang

TL;DR

This work reveals that language models develop robust concept-level spurious correlations driven by imbalanced concept-label distributions in both fine-tuning and in-context learning. It introduces a principled Bias@C metric and uses ChatGPT to annotate concepts, enabling measurement of concept shortcuts across datasets. A counterfactual data upsampling strategy, generated by large language models, is proposed and shown to reduce concept bias while maintaining or improving utility across multiple models and tasks. The findings highlight the need to account for high-level semantic content in robustness analyses and offer a practical mitigation pathway with broad implications for reliable NLP deployment.

Abstract

Language models (LMs) have achieved notable success in numerous NLP tasks, employing both fine-tuning and in-context learning (ICL) methods. While language models demonstrate exceptional performance, they face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars. Previous research has primarily concentrated on word, phrase, and syntax features, neglecting the concept level, often due to the absence of concept labels and difficulty in identifying conceptual content in input texts. This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data. We find that LMs, when encountering spurious correlations between a concept and a label in training or prompts, resort to shortcuts for predictions. Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations. Our method's efficacy, surpassing traditional token removal approaches, is validated through extensive testing.

Explore Spurious Correlations at the Concept Level in Language Models for Text Classification

TL;DR

This work reveals that language models develop robust concept-level spurious correlations driven by imbalanced concept-label distributions in both fine-tuning and in-context learning. It introduces a principled Bias@C metric and uses ChatGPT to annotate concepts, enabling measurement of concept shortcuts across datasets. A counterfactual data upsampling strategy, generated by large language models, is proposed and shown to reduce concept bias while maintaining or improving utility across multiple models and tasks. The findings highlight the need to account for high-level semantic content in robustness analyses and offer a practical mitigation pathway with broad implications for reliable NLP deployment.

Abstract

Language models (LMs) have achieved notable success in numerous NLP tasks, employing both fine-tuning and in-context learning (ICL) methods. While language models demonstrate exceptional performance, they face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars. Previous research has primarily concentrated on word, phrase, and syntax features, neglecting the concept level, often due to the absence of concept labels and difficulty in identifying conceptual content in input texts. This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data. We find that LMs, when encountering spurious correlations between a concept and a label in training or prompts, resort to shortcuts for predictions. Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations. Our method's efficacy, surpassing traditional token removal approaches, is validated through extensive testing.
Paper Structure (24 sections, 4 equations, 4 figures, 12 tables)

This paper contains 24 sections, 4 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Example of concept-level spurious correlations. In the training data or demonstrations, texts containing the concept "food" are mostly with label 1 (positive sentiment). During test, when encountering a sentence with the tokens "Thai steak," not appearing in the training/prompts but indicating the concept "food", the models rely on the shortcut between the concept "food" and label 1 to give the wrong prediction.
  • Figure 2: A concept can be expressed in multiple expressions, and in the embedding space of LMs, these expressions of one concept can be mapped into similar positions. LMs will form a shortcut between a specific concept and a label and utilize in the future prediction.
  • Figure 3: Label distribution of the texts with a specific concept for each dataset. We can observe the label distribution in multiple concepts, such as "music" in IMDB, "food" in Yelp datasets are highly imbalanced.
  • Figure 4: Clusters of word embeddings of top associated tokens for each concept from Amazon shoe dataset. The dendrogram on the side indicates the hierarchical clustering structure among the tokens.