Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Lok-Lam Ieong; Chia-Chien Chen; Chih-Kai Yang; Yu-Han Huang; An-Yu Cheng; Hung-yi Lee

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Lok-Lam Ieong, Chia-Chien Chen, Chih-Kai Yang, Yu-Han Huang, An-Yu Cheng, Hung-yi Lee

Abstract

Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse information sources and evaluate them across four LALMs and four benchmarks. Results show general accuracy gains up to 4.4% over CoT prompting. Notably, we identify a cross-modal transfer where steering vectors derived from few text samples effectively guide speech-based reasoning, demonstrating high data efficiency. We also examine hyperparameter sensitivity to understand the robustness of these approaches. Our findings position model steering as a practical direction for strengthening LALM reasoning.

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Abstract

Paper Structure (18 sections, 8 equations, 4 figures, 2 tables)

This paper contains 18 sections, 8 equations, 4 figures, 2 tables.

Introduction
Methodology
Extraction Phase
Vanilla Steering
Speech-derived Generalized Steering (SGS)
Text-derived Generalized Steering (TGS)
Injection Phase
Experimental Setups
Models
Baselines
Datasets and Evaluation Benchmarks
Results
Can Model Steering Improve Chain-of-thought?
Hyperparameter Sensitivity of Steering Methods
Data Efficiency of SGS and TGS
...and 3 more sections

Figures (4)

Figure 1: Example of reasoning enhanced by steering.
Figure 2: Overview of our three proposed methods in the extraction phase, along with the subsequent injection phase.
Figure 3: Hyperparameter sensitivity of the steering methods.
Figure 4: Effect of external dataset size for SGS/TGS on Voxtral.

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Abstract

Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models

Authors

Abstract

Table of Contents

Figures (4)