Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

Yongshuo Zong; Tingyang Yu; Ruchika Chavhan; Bingchen Zhao; Timothy Hospedales

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

Yongshuo Zong, Tingyang Yu, Ruchika Chavhan, Bingchen Zhao, Timothy Hospedales

TL;DR

Fools Your (Vision and) Language Model With Embarrassingly Simple Permutations analyzes how MCQA performance collapses under option permutations across LLMs and VLLMs, highlighting a systemic vulnerability that scales with the answer-space size $k!$. The authors run a broad empirical campaign across model families and datasets, demonstrating that permutation attacks dramatically reduce accuracy, sometimes below random chance, and that answer-set pruning offers little protection. They investigate causes including position bias and symbol-content shortcuts, and assess mitigations such as post-hoc voting and permutation-aware fine-tuning, finding limited success for the former but some promise for training-time defenses. The work warns against overreliance on MCQA benchmarks, motivates developing permutation-robust architectures and evaluation protocols, and provides a public codebase for replication.

Abstract

Large language and vision-language models are rapidly being deployed in practice thanks to their impressive capabilities in instruction following, in-context learning, and so on. This raises an urgent need to carefully analyse their robustness so that stakeholders can understand if and when such models are trustworthy enough to be relied upon in any given application. In this paper, we highlight a specific vulnerability in popular models, namely permutation sensitivity in multiple-choice question answering (MCQA). Specifically, we show empirically that popular models are vulnerable to adversarial permutation in answer sets for multiple-choice prompting, which is surprising as models should ideally be as invariant to prompt permutation as humans are. These vulnerabilities persist across various model sizes, and exist in very recent language and vision-language models. Code is available at https://github.com/ys-zong/FoolyourVLLMs.

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

TL;DR

. The authors run a broad empirical campaign across model families and datasets, demonstrating that permutation attacks dramatically reduce accuracy, sometimes below random chance, and that answer-set pruning offers little protection. They investigate causes including position bias and symbol-content shortcuts, and assess mitigations such as post-hoc voting and permutation-aware fine-tuning, finding limited success for the former but some promise for training-time defenses. The work warns against overreliance on MCQA benchmarks, motivates developing permutation-robust architectures and evaluation protocols, and provides a public codebase for replication.

Abstract

Paper Structure (18 sections, 1 equation, 4 figures, 32 tables)

This paper contains 18 sections, 1 equation, 4 figures, 32 tables.

Introduction
Simple Adversarial Attack Breaks LLMs and VLLMs
Experiment Setup
Main Results
Answer Set Pruning
Understanding Vulnerability Causes
Position Bias and Other Attacks
Symbol-Content Spurious Correlation
Exploring Mitigation Strategies
Post-hoc Mitigation Strategies
Fine-tuning on Training Set
Related Work
Discussion
Appendix
Additional Results on Answer Set Pruning
...and 3 more sections

Figures (4)

Figure 1: a. Schematic Illustration of an MCQA permutation attack. b. Summary of MCQA adversarial attack results for both LLMs and VLLMs. The values are average accuracy across all benchmarking datasets.
Figure 2: The correlation analysis of Llama2-13B model's predictions across different pairs of options symbols of each permutation reveals a notable finding: the low correlation score between permutation predictions when using capital letters and Roman numerals suggests that the model may have learned shortcuts or spurious correlations linking option symbols with answer content.
Figure 3: Qualitative results of permutations of answer options and the corresponding model (Otter-Llama) predictions. The example is selected from the ScienceQA dataset.
Figure 4: Analysis on permutation distribution. The histogram shows the number of questions for which the corresponding proportion of permutations leads to the correct answer (ideal is a full bar at the 100% bin, indicating that all permutations are correctly answered for all questions). The distribution of bins suggests that many questions have multiple adversarial permutations.

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

TL;DR

Abstract

Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations

Authors

TL;DR

Abstract

Table of Contents

Figures (4)