Table of Contents
Fetching ...

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

Jiajun Xu, Jiageng Mao, Ang Qi, Weiduo Yuan, Alexander Romanus, Helen Xia, Vitor Campagnolo Guizilini, Yue Wang

TL;DR

This paper proposes an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities, and can consistently drive down a target VLM's answer accuracy.

Abstract

Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is critical for ensuring the reliability and safety of AI systems. In this paper, we propose an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities. The core of this approach lies in fuzz testing and reinforcement finetuning: we transform a single input query into a large set of diverse variants through vision and language fuzzing. Based on the fuzzing outcomes, the question generator is further instructed by adversarial reinforcement fine-tuning to produce increasingly challenging queries that trigger model failures. With this approach, we can consistently drive down a target VLM's answer accuracy -- for example, the accuracy of Qwen2.5-VL-32B on our generated questions drops from 86.58\% to 65.53\% in four RL iterations. Moreover, a fuzzing policy trained against a single target VLM transfers to multiple other VLMs, producing challenging queries that degrade their performance as well.

FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures

TL;DR

This paper proposes an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities, and can consistently drive down a target VLM's answer accuracy.

Abstract

Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is critical for ensuring the reliability and safety of AI systems. In this paper, we propose an approach that automatically generates questions designed to deliberately induce incorrect responses from VLMs, thereby revealing their vulnerabilities. The core of this approach lies in fuzz testing and reinforcement finetuning: we transform a single input query into a large set of diverse variants through vision and language fuzzing. Based on the fuzzing outcomes, the question generator is further instructed by adversarial reinforcement fine-tuning to produce increasingly challenging queries that trigger model failures. With this approach, we can consistently drive down a target VLM's answer accuracy -- for example, the accuracy of Qwen2.5-VL-32B on our generated questions drops from 86.58\% to 65.53\% in four RL iterations. Moreover, a fuzzing policy trained against a single target VLM transfers to multiple other VLMs, producing challenging queries that degrade their performance as well.
Paper Structure (14 sections, 10 equations, 4 figures, 5 tables)

This paper contains 14 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: FuzzingRL probe examples. For a given capability subdimension $d$, each panel shows an answerable probe generated by a specific fuzzing role and the answer from the target model (Qwen2.5-VL-32B).
  • Figure 2: Overview of FuzzingRL. A fuzzing model (e.g., Qwen2.5-VL-7B) samples from an image base organized by 24 subdimensions and 8 fuzzing roles to generate diverse, error-prone questions for a target VLM. The target’s responses are scored via reward calculation, using a GPT-4o judge when confidence is high and a human judge otherwise, and the resulting preference pairs are used for DPO training to update the fuzzing model. The bottom panel illustrates how iterative training progressively sharpens the generated queries from ordinary perception questions to more failure-inducing, compositional prompts, thereby improving the fuzzing model’s ability to surface failure cases and making it more likely to elicit incorrect answers from VLMs.
  • Figure 3: FR by iteration with transfer evaluation. FR ($=1-\mathrm{Acc}$) over training iterations on the target model (Qwen2.5-VL-32B) and three held-out test VLMs. FR on the target increases steadily, whereas transfer FR peaks around iteration 4 and may drop with further training, so we stop at iteration 4.
  • Figure 4: Cross-model results of the trained fuzzing model. Example probes and test-model outputs showing four recurrent failure families: (a) part--whole/compositional proximity, (b) counting under clutter, (c) discourse/negation in scenario recognition, and (d) hypothetical object-presence reasoning.