Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias
Anshuman Chhabra, Hadi Askari, Prasant Mohapatra
TL;DR
The paper tackles zero-shot abstractive summarization with Large Language Models by introducing position bias as a generalization of lead bias. It proposes a measurement framework that maps summary content back to article sentences, segments articles into $K$ parts, and compares gold vs. model-derived positional distributions using the Wasserstein distance $W$ to quantify bias. Empirically, LLMs generally achieve high ROUGE scores with low position bias across most datasets, though XSum exhibits stronger lead bias; encoder–decoder baselines show higher bias in zero-shot settings. The work further demonstrates that finetuning alignment and prompt engineering can influence bias, and provides open-source code to support reproducibility and further study.
Abstract
We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.
