Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations
Yongshuo Zong, Tingyang Yu, Ruchika Chavhan, Bingchen Zhao, Timothy Hospedales
TL;DR
Fools Your (Vision and) Language Model With Embarrassingly Simple Permutations analyzes how MCQA performance collapses under option permutations across LLMs and VLLMs, highlighting a systemic vulnerability that scales with the answer-space size $k!$. The authors run a broad empirical campaign across model families and datasets, demonstrating that permutation attacks dramatically reduce accuracy, sometimes below random chance, and that answer-set pruning offers little protection. They investigate causes including position bias and symbol-content shortcuts, and assess mitigations such as post-hoc voting and permutation-aware fine-tuning, finding limited success for the former but some promise for training-time defenses. The work warns against overreliance on MCQA benchmarks, motivates developing permutation-robust architectures and evaluation protocols, and provides a public codebase for replication.
Abstract
Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code is available at https://github.com/ys-zong/FoolyourVLLMs.
