Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement
Pengwei Zhan, Zhen Xu, Qian Tan, Jie Song, Ru Xie
TL;DR
The paper shows that large language models exhibit pronounced sensitivity to lexical variations in prompts, even when changes are nearly imperceptible to humans. It introduces COPLE, a black-box combinatorial optimization framework that iteratively substitutes semantically similar words in the task description based on feedback from proxy tasks to maximize downstream performance. Across GLUE and MMLU benchmarks and multiple models, COPLE substantially improves results relative to human-crafted prompts and other prompting baselines, demonstrating that lexical optimization can recover instruction-following and task-solving abilities. The work highlights the importance of evaluating and optimizing the exact wording of prompts prior to more complex prompt engineering, with implications for robustness and reproducibility of LLM-based systems.
Abstract
Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to humans. By providing models with neighborhood instructions, which are closely situated in the latent representation space and differ by only one semantically similar word, the performance on downstream tasks can be vastly different. Following this property, we propose a black-box Combinatorial Optimization framework for Prompt Lexical Enhancement (COPLE). COPLE performs iterative lexical optimization according to the feedback from a batch of proxy tasks, using a search strategy related to word influence. Experiments show that even widely-used human-crafted prompts for current benchmarks suffer from the lexical sensitivity of models, and COPLE recovers the declined model ability in both instruct-following and solving downstream tasks.
