SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models

Anushka Sivakumar; Andrew Zhang; Zaber Hakim; Chris Thomas

SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models

Anushka Sivakumar, Andrew Zhang, Zaber Hakim, Chris Thomas

TL;DR

SteerVLM tackles the challenge of steering vision-language models toward targeted outputs without fine-tuning by introducing a lightweight, inference-time steering module that operates on latent activations. The module comprises a shared Steerer and SteeringGate placed after each language-model attention layer, using target and converse prompts to compute a delta added to activations: $z_l = x_l + \lambda \bar{x}_l$, with dimension-wise, token-specific control. A new multimodal dataset, VNIA, supports training and evaluation of steering in VLMs, and experiments show state-of-the-art zero-shot performance on hallucination mitigation (OHD) and strong topic-steering performance on VNIA, outperforming prior interventions by significant margins. The approach offers robust, generalizable multimodal model control with minimal parameter overhead, enabling safer and more controllable VLM outputs in real-time applications, while acknowledging limitations such as dataset synthesize and added forward-pass requirements.

Abstract

This work introduces SteerVLM, a lightweight steering module designed to guide Vision-Language Models (VLMs) towards outputs that better adhere to desired instructions. Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context. This allows for fine-grained, inference-time control over complex output semantics without modifying model weights while preserving performance on off-target tasks. Our steering module requires learning parameters equal to 0.14% of the original VLM's size. Our steering module gains model control through dimension-wise activation modulation and adaptive steering across layers without requiring pre-extracted static vectors or manual tuning of intervention points. Furthermore, we introduce VNIA (Visual Narrative Intent Alignment), a multimodal dataset specifically created to facilitate the development and evaluation of VLM steering techniques. Our method outperforms existing intervention techniques on steering and hallucination mitigation benchmarks for VLMs and proposes a robust solution for multimodal model control through activation engineering.

SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models

TL;DR

Abstract

SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)