Table of Contents
Fetching ...

Measuring Agreeableness Bias in Multimodal Models

Jaehyuk Lim, Bruce W. Lee

TL;DR

The findings reveal a significant shift in the models' responses towards the pre-marked option, even when it contradicts their answers in the neutral settings, raising important questions about their application in critical decision-making contexts where such visual cues might be present.

Abstract

This paper examines a phenomenon in multimodal language models where pre-marked options in question images can significantly influence model responses. Our study employs a systematic methodology to investigate this effect: we present models with images of multiple-choice questions, which they initially answer correctly, then expose the same model to versions with pre-marked options. Our findings reveal a significant shift in the models' responses towards the pre-marked option, even when it contradicts their answers in the neutral settings. Comprehensive evaluations demonstrate that this agreeableness bias is a consistent and quantifiable behavior across various model architectures. These results show potential limitations in the reliability of these models when processing images with pre-marked options, raising important questions about their application in critical decision-making contexts where such visual cues might be present.

Measuring Agreeableness Bias in Multimodal Models

TL;DR

The findings reveal a significant shift in the models' responses towards the pre-marked option, even when it contradicts their answers in the neutral settings, raising important questions about their application in critical decision-making contexts where such visual cues might be present.

Abstract

This paper examines a phenomenon in multimodal language models where pre-marked options in question images can significantly influence model responses. Our study employs a systematic methodology to investigate this effect: we present models with images of multiple-choice questions, which they initially answer correctly, then expose the same model to versions with pre-marked options. Our findings reveal a significant shift in the models' responses towards the pre-marked option, even when it contradicts their answers in the neutral settings. Comprehensive evaluations demonstrate that this agreeableness bias is a consistent and quantifiable behavior across various model architectures. These results show potential limitations in the reliability of these models when processing images with pre-marked options, raising important questions about their application in critical decision-making contexts where such visual cues might be present.
Paper Structure (7 sections, 6 figures, 2 tables)

This paper contains 7 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A sample of HTML-rendered vMMLU prompt, neutral
  • Figure 2: A sample of HTML-rendered vMMLU prompt, option C bias
  • Figure 3: A sample of HTML-rendered vSocialIQa prompt, neutral
  • Figure 4: A sample of HTML-rendered vSocialIQa prompt, option B bias
  • Figure 5: Average change in linear probability between neutral and biased prompts for vMMLU (top row) and vSocialIQa (bottom row). The left column represents highlight bias. The top right plot displays size bias, and the bottom right plot shows highlight bias in a typical webpage format, where black text is highlighted in light blue. The type of bias strongly correlates with increased token probability for the corresponding answer choice.
  • ...and 1 more figures