Table of Contents
Fetching ...

Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs

Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang

TL;DR

This work addresses the vulnerability of large multimodal models to negation-based gaslighting by introducing GasEraser, a training-free attention-reallocation method that shifts emphasis from misleading textual tokens to visually grounded image regions. By identifying visual-attention sinks and selecting vision-centric heads, GasEraser reweights attention maps during inference without retraining, yielding substantial robustness gains on GaslightingBench across open-source LMMs. The approach demonstrates that gaslighting signals are predominantly text-based and that early-layer visual processing is critical for robust grounding, with the top 16 layers offering the most benefit. Overall, GasEraser offers a practical, plug-in solution for more trustworthy multimodal reasoning in the face of adversarial or misleading prompts.

Abstract

Large Multimodal Models (LMMs) have demonstrated remarkable capabilities across a wide range of tasks. However, their vulnerability to user gaslighting-the deliberate use of misleading or contradictory inputs-raises critical concerns about their reliability in real-world applications. In this paper, we address the novel and challenging issue of mitigating the negative impact of negation-based gaslighting on LMMs, where deceptive user statements lead to significant drops in model accuracy. Specifically, we introduce GasEraser, a training-free approach that reallocates attention weights from misleading textual tokens to semantically salient visual regions. By suppressing the influence of "attention sink" tokens and enhancing focus on visually grounded cues, GasEraser significantly improves LMM robustness without requiring retraining or additional supervision. Extensive experimental results demonstrate that GasEraser is effective across several leading open-source LMMs on the GaslightingBench. Notably, for LLaVA-v1.5-7B, GasEraser reduces the misguidance rate by 48.2%, demonstrating its potential for more trustworthy LMMs.

Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs

TL;DR

This work addresses the vulnerability of large multimodal models to negation-based gaslighting by introducing GasEraser, a training-free attention-reallocation method that shifts emphasis from misleading textual tokens to visually grounded image regions. By identifying visual-attention sinks and selecting vision-centric heads, GasEraser reweights attention maps during inference without retraining, yielding substantial robustness gains on GaslightingBench across open-source LMMs. The approach demonstrates that gaslighting signals are predominantly text-based and that early-layer visual processing is critical for robust grounding, with the top 16 layers offering the most benefit. Overall, GasEraser offers a practical, plug-in solution for more trustworthy multimodal reasoning in the face of adversarial or misleading prompts.

Abstract

Large Multimodal Models (LMMs) have demonstrated remarkable capabilities across a wide range of tasks. However, their vulnerability to user gaslighting-the deliberate use of misleading or contradictory inputs-raises critical concerns about their reliability in real-world applications. In this paper, we address the novel and challenging issue of mitigating the negative impact of negation-based gaslighting on LMMs, where deceptive user statements lead to significant drops in model accuracy. Specifically, we introduce GasEraser, a training-free approach that reallocates attention weights from misleading textual tokens to semantically salient visual regions. By suppressing the influence of "attention sink" tokens and enhancing focus on visually grounded cues, GasEraser significantly improves LMM robustness without requiring retraining or additional supervision. Extensive experimental results demonstrate that GasEraser is effective across several leading open-source LMMs on the GaslightingBench. Notably, for LLaVA-v1.5-7B, GasEraser reduces the misguidance rate by 48.2%, demonstrating its potential for more trustworthy LMMs.

Paper Structure

This paper contains 30 sections, 12 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of negation-based gaslighting in Large Multimodal Models (LMMs). A negation-based gaslighting statement refers to a misleading user prompt that contradicts the initial correct answers (e.g., “There are two pineapples in the image,” when only one is present). The figure demonstrates how such deceptive inputs can override the model’s initially accurate response, leading it to adopt the false premise.
  • Figure 2: Performance comparison of three models on GaslightingBench, highlighting the impact of negation-based gaslighting and the effectiveness of the proposed GasEraser. The figure shows the models' accuracy under three conditions: before negation, after negation for base LMMs, and after negation with GasEraser applied to the base LMMs.
  • Figure 3: (a) The image-relevant token attends to both key and some irrelevant visual features. (b) Gaslighting tokens primarily focus on irrelevant visual features. (c) and (e) show normal token embeddings for LLaVA-v1.5-7B and InternVL2-8B, while (d) and (f) show the corresponding sink token embeddings, which exhibit significantly higher norms in specific dimensions.
  • Figure 4: Illustration of our GasEraser. (a) Multi-head attention applies multiple attention mechanisms in parallel, allowing the model to capture different perspectives of the information. (b) We evaluate the relevance between image and text tokens to identify which visual-textual associations are important. (c) We then relocate attention from less important associations that have high attention scores to those that are more relevant.
  • Figure 5: Qualitative examples using LLaVA-1.5-7B as the base model. The base model generates incorrect answers when misled by gaslighting negation statements, whereas our method effectively mitigates the impact of such misleading content. The ground truth option is highlighted in green.
  • ...and 2 more figures