Table of Contents
Fetching ...

Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation

Yerin Hwang, Dongryeol Lee, Kyungmin Min, Taegwan Kang, Yong-il Kim, Kyomin Jung

TL;DR

Large vision-language model (LVLM) judges are widely used to assess text-image alignment in text-to-image generation, but their robustness to visual perturbations is untested. This work defines a taxonomy of visual biases, introduces FRAME as a controllable, multi-domain benchmark, and demonstrates that LVLM judges inflate scores for biased images across nine models and five domains. The results show eight bias types—especially Instruction Overlay—systematically exploit LVLMs, with prompting mitigation only partially reducing, not eliminating, bias. The findings underscore the need for robust LVLM judging frameworks and bias-resistant evaluation protocols to avoid misleading rewards and degraded alignment.

Abstract

Recently, large vision-language models (LVLMs) have emerged as the preferred tools for judging text-image alignment, yet their robustness along the visual modality remains underexplored. This work is the first study to address a key research question: Can adversarial visual manipulations systematically fool LVLM judges into assigning unfairly inflated scores? We define potential image induced biases within the context of T2I evaluation and examine how these biases affect the evaluations of LVLM judges. Moreover, we introduce a novel, fine-grained, multi-domain meta-evaluation benchmark named FRAME, which is deliberately constructed to exhibit diverse score distributions. By introducing the defined biases into the benchmark, we reveal that all tested LVLM judges exhibit vulnerability across all domains, consistently inflating scores for manipulated images. Further analysis reveals that combining multiple biases amplifies their effects, and pairwise evaluations are similarly susceptible. Moreover, we observe that visual biases persist under prompt-based mitigation strategies, highlighting the vulnerability of current LVLM evaluation systems and underscoring the urgent need for more robust LVLM judges.

Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation

TL;DR

Large vision-language model (LVLM) judges are widely used to assess text-image alignment in text-to-image generation, but their robustness to visual perturbations is untested. This work defines a taxonomy of visual biases, introduces FRAME as a controllable, multi-domain benchmark, and demonstrates that LVLM judges inflate scores for biased images across nine models and five domains. The results show eight bias types—especially Instruction Overlay—systematically exploit LVLMs, with prompting mitigation only partially reducing, not eliminating, bias. The findings underscore the need for robust LVLM judging frameworks and bias-resistant evaluation protocols to avoid misleading rewards and degraded alignment.

Abstract

Recently, large vision-language models (LVLMs) have emerged as the preferred tools for judging text-image alignment, yet their robustness along the visual modality remains underexplored. This work is the first study to address a key research question: Can adversarial visual manipulations systematically fool LVLM judges into assigning unfairly inflated scores? We define potential image induced biases within the context of T2I evaluation and examine how these biases affect the evaluations of LVLM judges. Moreover, we introduce a novel, fine-grained, multi-domain meta-evaluation benchmark named FRAME, which is deliberately constructed to exhibit diverse score distributions. By introducing the defined biases into the benchmark, we reveal that all tested LVLM judges exhibit vulnerability across all domains, consistently inflating scores for manipulated images. Further analysis reveals that combining multiple biases amplifies their effects, and pairwise evaluations are similarly susceptible. Moreover, we observe that visual biases persist under prompt-based mitigation strategies, highlighting the vulnerability of current LVLM evaluation systems and underscoring the urgent need for more robust LVLM judges.

Paper Structure

This paper contains 40 sections, 9 figures, 12 tables.

Figures (9)

  • Figure 1: The LVLM judge is influenced by visual manipulations, resulting in an unfairly inflated evaluation score. Embedding the image generation instruction in the image (left) produces a manipulated image (right), leading to unfair assessment.
  • Figure 2: Impact of visual biases across all LVLM judges. Left: Average attack success rates across five domains and eight types of visual bias. An attack is considered successful when the LVLM assigns a higher average score to the biased images than to the original counterparts. Right: Average percentage increase in score for successful attacks, reflecting the magnitude of the visual bias effect.
  • Figure 3: Pairwise evaluation of group A vs. group B. Top: original results. Bottom: results after applying instruction overlay bias to set A.
  • Figure 4: Prompt template used for single-image scoring evaluations reported in Table \ref{['table_main']}.
  • Figure 5: Prompt template used for bias-aware promoting methods reported in Table \ref{['tab:prompting']}.
  • ...and 4 more figures