Exploring the Robustness of In-Context Learning with Noisy Labels

Chen Cheng; Xinzhi Yu; Haodong Wen; Jingsong Sun; Guanzhang Yue; Yihao Zhang; Zeming Wei

Exploring the Robustness of In-Context Learning with Noisy Labels

Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei

TL;DR

This work investigates the resilience of Transformer-based In-Context Learning (ICL) to noisy labels in demonstrations and training data, addressing a practical concern for real-world language models. It adopts a controlled noisy-linear regression framework with the simple function class $f_w(x)=w^T x$ to study ICL robustness, and evaluates performance against baselines using a large synthetic prompt corpus. Key findings show that Transformers are fairly robust to label noise in demonstrations across several symmetric noise types, and that introducing similar noise into training data via curriculum learning can further enhance robustness, though multiplicative and Salt&Pepper noises are more challenging. The results offer actionable insights into ICL behavior, data augmentation strategies, and safety considerations for alignment in real-world NLP systems, with code made publicly available.

Abstract

Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspired by prior research that studies ICL ability using simple function classes, we take a closer look at this problem by investigating the robustness of Transformers against noisy labels. Specifically, we first conduct a thorough evaluation and analysis of the robustness of Transformers against noisy labels during in-context learning and show that they exhibit notable resilience against diverse types of noise in demonstration labels. Furthermore, we delve deeper into this problem by exploring whether introducing noise into the training set, akin to a form of data augmentation, enhances such robustness during inference, and find that such noise can indeed improve the robustness of ICL. Overall, our fruitful analysis and findings provide a comprehensive understanding of the resilience of Transformer models against label noises during ICL and provide valuable insights into the research on Transformers in natural language processing. Our code is available at https://github.com/InezYu0928/in-context-learning.

Exploring the Robustness of In-Context Learning with Noisy Labels

TL;DR

to study ICL robustness, and evaluates performance against baselines using a large synthetic prompt corpus. Key findings show that Transformers are fairly robust to label noise in demonstrations across several symmetric noise types, and that introducing similar noise into training data via curriculum learning can further enhance robustness, though multiplicative and Salt&Pepper noises are more challenging. The results offer actionable insights into ICL behavior, data augmentation strategies, and safety considerations for alignment in real-world NLP systems, with code made publicly available.

Abstract

Paper Structure (14 sections, 2 equations, 8 figures, 1 table)

This paper contains 14 sections, 2 equations, 8 figures, 1 table.

Introduction
Preliminaries
Comprehensive evaluation of noisy ICL inference
Influence of training with noisy labels on robustness
Conclusion
Additional related work
Understanding In-context Learning
Noisy label learning
Language model safety and alignment
Complete visualizations for main evaluation in Section \ref{['sec: eval']}
Non-i.i.d. noises
Different function complexities
Different input dimensions
Revisiting input dimensions with noisy label training

Figures (8)

Figure 1: Robustness Comparison under Different Noise Types and magnitudes. Each figure represents a noise and each line represents a $\sigma_{test}$. The X-axis represents the number of in-context examples.
Figure 2: The training loss for a standard-sized transformer with Gaussian noise of $\sigma_{train} \in \left \{ 0.0,0.2,0.4,0.6,0.8,1.0 \right \}$.
Figure 3: Effects of Gaussian Noise Magnitude and Model Size on Performance.
Figure 4: Robustness Comparison under Different Noise Types and magnitudes. Each figure represents a noise and each line represents a $\sigma_{test}$. The X-axis represents the number of in-context examples.
Figure 5: In-context learning on prompts with outliers. We evaluate the trained model on prompts with outliers in two cases. (1) On the rows, we explored the effect of the number of outliers ({1,2,4,8,16}), and (2) on the columns, we explored the effect of the magnitude of outliers ({0.1,1,10,100}). (Error averaged over 1280 prompts. 90% confidence intervals over 1000 bootstrap trials.)
...and 3 more figures

Exploring the Robustness of In-Context Learning with Noisy Labels

TL;DR

Abstract

Exploring the Robustness of In-Context Learning with Noisy Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (8)