In-Context Learning (and Unlearning) of Length Biases
Stephanie Schoch, Yangfeng Ji
TL;DR
This work investigates whether in-context learning (ICL) in large language models can acquire length-based statistical biases present in data. The authors quantify how demonstration length, the number of demonstrations, and model size influence the emergence of length bias, using seven binary classification datasets and multiple model families. They demonstrate that length biases can be learned in-context and that longer demonstrations can increase the bias magnitude, even when the underlying finetuning did not exploit such cues. Importantly, they show that ICL can be used to debias finetuned models by sampling demonstrations from opposite-length tails or via random sampling, offering a cost-effective mechanism to mitigate biases without parameter updates. The findings underscore the need for balanced demonstration sampling in prompts and provide practical guidance for designing robust ICL pipelines and debiasing strategies.
Abstract
Large language models have demonstrated strong capabilities to learn in-context, where exemplar input-output pairings are appended to the prompt for demonstration. However, existing work has demonstrated the ability of models to learn lexical and label biases in-context, which negatively impacts both performance and robustness of models. The impact of other statistical data biases remains under-explored, which this work aims to address. We specifically investigate the impact of length biases on in-context learning. We demonstrate that models do learn length biases in the context window for their predictions, and further empirically analyze the factors that modulate the level of bias exhibited by the model. In addition, we show that learning length information in-context can be used to counter the length bias that has been encoded in models (e.g., via fine-tuning). This reveals the power of in-context learning in debiasing model prediction behaviors without the need for costly parameter updates.
