Exploring the Robustness of In-Context Learning with Noisy Labels
Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei
TL;DR
This work investigates the resilience of Transformer-based In-Context Learning (ICL) to noisy labels in demonstrations and training data, addressing a practical concern for real-world language models. It adopts a controlled noisy-linear regression framework with the simple function class $f_w(x)=w^T x$ to study ICL robustness, and evaluates performance against baselines using a large synthetic prompt corpus. Key findings show that Transformers are fairly robust to label noise in demonstrations across several symmetric noise types, and that introducing similar noise into training data via curriculum learning can further enhance robustness, though multiplicative and Salt&Pepper noises are more challenging. The results offer actionable insights into ICL behavior, data augmentation strategies, and safety considerations for alignment in real-world NLP systems, with code made publicly available.
Abstract
Recently, the mysterious In-Context Learning (ICL) ability exhibited by Transformer architectures, especially in large language models (LLMs), has sparked significant research interest. However, the resilience of Transformers' in-context learning capabilities in the presence of noisy samples, prevalent in both training corpora and prompt demonstrations, remains underexplored. In this paper, inspired by prior research that studies ICL ability using simple function classes, we take a closer look at this problem by investigating the robustness of Transformers against noisy labels. Specifically, we first conduct a thorough evaluation and analysis of the robustness of Transformers against noisy labels during in-context learning and show that they exhibit notable resilience against diverse types of noise in demonstration labels. Furthermore, we delve deeper into this problem by exploring whether introducing noise into the training set, akin to a form of data augmentation, enhances such robustness during inference, and find that such noise can indeed improve the robustness of ICL. Overall, our fruitful analysis and findings provide a comprehensive understanding of the resilience of Transformer models against label noises during ICL and provide valuable insights into the research on Transformers in natural language processing. Our code is available at https://github.com/InezYu0928/in-context-learning.
