Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction

Junlang Qian; Zixiao Zhu; Hanzhang Zhou; Zijian Feng; Zepeng Zhai; Kezhi Mao

Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction

Junlang Qian, Zixiao Zhu, Hanzhang Zhou, Zijian Feng, Zepeng Zhai, Kezhi Mao

TL;DR

The paper addresses prompt brittleness in zero-shot text classification by moving beyond sole reliance on next-token predictions and introducing Placeholding Parallel Prediction ($\mathcal{P}^3$). $\mathcal{P}^3$ enables multiple subsequent token predictions in a single LM run by appending placeholder tokens and aggregating predictions, reducing sensitivity to prompt wording while maintaining efficiency. Empirical results across seven datasets with LLaMA2-13B/70B demonstrate substantial reductions in cross-prompt variance and improved accuracy, with strong performance even without prompts. This approach significantly lowers the need for prompt engineering, enhances robustness, and offers a scalable, efficient pathway for robust zero-shot classification in practical deployments.

Abstract

Zero-shot text classification typically relies on prompt engineering, but the inherent prompt brittleness of large language models undermines its reliability. Minor changes in prompt can cause significant discrepancies in model performance. We attribute this prompt brittleness largely to the narrow focus on nexttoken probabilities in existing methods. To address this, we propose Placeholding Parallel Prediction (P3), a novel approach that predicts token probabilities across multiple positions and simulates comprehensive sampling of generation paths in a single run of a language model. Experiments show improved accuracy and up to 98% reduction in the standard deviation across prompts, boosting robustness. Even without a prompt, P3 maintains comparable performance, reducing the need for prompt engineering.

Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction

TL;DR

The paper addresses prompt brittleness in zero-shot text classification by moving beyond sole reliance on next-token predictions and introducing Placeholding Parallel Prediction (

enables multiple subsequent token predictions in a single LM run by appending placeholder tokens and aggregating predictions, reducing sensitivity to prompt wording while maintaining efficiency. Empirical results across seven datasets with LLaMA2-13B/70B demonstrate substantial reductions in cross-prompt variance and improved accuracy, with strong performance even without prompts. This approach significantly lowers the need for prompt engineering, enhances robustness, and offers a scalable, efficient pathway for robust zero-shot classification in practical deployments.

Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction

TL;DR

Abstract

Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)