Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

Jianghao Yin; Qin Chen; Kedi Chen; Jie Zhou; Xingjiao Wu; Liang He

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

Jianghao Yin, Qin Chen, Kedi Chen, Jie Zhou, Xingjiao Wu, Liang He

TL;DR

Dynamic Multimodal Activation Steering proposes a semantic-based truthfulness steering vector database and computes visual perception steering vectors, enabling context-aware interventions during inference by dynamically selecting the most relevant steering vectors based on input semantic similarity and applying them to the most influential attention heads.

Abstract

Large Vision-Language Models (LVLMs) exhibit outstanding performance on vision-language tasks but struggle with hallucination problems. Through in-depth analysis of LVLM activation patterns, we reveal two key findings: 1) truthfulness and visual perception capabilities predominantly engage different subsets of attention heads within the model architecture; and 2) truthfulness steering vectors vary significantly across different semantic contexts. Based on these observations, we propose Dynamic Multimodal Activation Steering, a training-free approach for hallucination mitigation. Our method constructs a semantic-based truthfulness steering vector database and computes visual perception steering vectors, enabling context-aware interventions during inference by dynamically selecting the most relevant steering vectors based on input semantic similarity and applying them to the most influential attention heads. We conduct comprehensive experiments across multiple models and datasets, demonstrating that our approach significantly enhances model performance, outperforming existing state-of-the-art methods.

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

TL;DR

Abstract

Paper Structure (41 sections, 6 equations, 6 figures, 14 tables)

This paper contains 41 sections, 6 equations, 6 figures, 14 tables.

Introduction
Related Work
Large Vision-Language Models
Hallucination Mitigation for LVLMs
Preliminary Study
Method
Truthfulness Steering Vector Database
Visual Perception Steering Vector
Dynamic Intervention for Inference
Experimental Setup
Datasets and Evaluation Metrics
MME
POPE
CHAIR
Baselines and Implementation Details
...and 26 more sections

Figures (6)

Figure 1: Activation differences in LLaVAv1.5.
Figure 2: Overview of the DMAS framework.
Figure 3: Impact of key hyperparameters.
Figure 4: Effect of dynamic intervention.
Figure 5: Effect of dynamic intervention.
...and 1 more figures

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

TL;DR

Abstract

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)