Table of Contents
Fetching ...

Inference-time Alignment via Sparse Junction Steering

Runyi Hu, Jie Zhang, Shiqian Zhao, Jiale Meng, Jiwei Li, Jason Zeng, Ming Wu, Michael Heinrich, Yonggang Wen, Tianwei Zhang

TL;DR

This work shows that dense intervention is unnecessary and proposes Sparse Inference time Alignment (SIA), which performs sparse junction steering by intervening only at critical decision points along the generation trajectory, and reduces computational cost by up to 6x.

Abstract

Token-level steering has emerged as a pivotal approach for inference-time alignment, enabling fine grained control over large language models by modulating their output distributions without parameter updates. While effective, existing methods rely on dense intervention at every decoding step. This persistent manipulation not only incurs substantial computational overhead but also risks compromising generation quality by excessively drifting from the model's intrinsic distribution. In this work, we show that dense intervention is unnecessary and propose Sparse Inference time Alignment (SIA), which performs sparse junction steering by intervening only at critical decision points along the generation trajectory. Our key insight is that high entropy junctions mark pivotal decision points in the generation trajectory and are particularly susceptible to misalignment, indicating the need to introduce alignment related reward signals at these points. Extensive experiments across different model families and alignment objectives show that steering only 20% to 80% of tokens achieves superior alignment-efficiency trade offs. For strong base models such as Qwen3, intervening on as few as 20% of tokens matches or even surpasses heavily post-trained instruct models. This sparsity enables stronger guidance while better preserving the model's native distribution, integrates seamlessly with search based methods such as Best-of-N, and reduces computational cost by up to 6x.

Inference-time Alignment via Sparse Junction Steering

TL;DR

This work shows that dense intervention is unnecessary and proposes Sparse Inference time Alignment (SIA), which performs sparse junction steering by intervening only at critical decision points along the generation trajectory, and reduces computational cost by up to 6x.

Abstract

Token-level steering has emerged as a pivotal approach for inference-time alignment, enabling fine grained control over large language models by modulating their output distributions without parameter updates. While effective, existing methods rely on dense intervention at every decoding step. This persistent manipulation not only incurs substantial computational overhead but also risks compromising generation quality by excessively drifting from the model's intrinsic distribution. In this work, we show that dense intervention is unnecessary and propose Sparse Inference time Alignment (SIA), which performs sparse junction steering by intervening only at critical decision points along the generation trajectory. Our key insight is that high entropy junctions mark pivotal decision points in the generation trajectory and are particularly susceptible to misalignment, indicating the need to introduce alignment related reward signals at these points. Extensive experiments across different model families and alignment objectives show that steering only 20% to 80% of tokens achieves superior alignment-efficiency trade offs. For strong base models such as Qwen3, intervening on as few as 20% of tokens matches or even surpasses heavily post-trained instruct models. This sparsity enables stronger guidance while better preserving the model's native distribution, integrates seamlessly with search based methods such as Best-of-N, and reduces computational cost by up to 6x.
Paper Structure (44 sections, 3 theorems, 20 equations, 17 figures, 1 table)

This paper contains 44 sections, 3 theorems, 20 equations, 17 figures, 1 table.

Key Result

Lemma 2.1

For a single step $t$, the loss in the KL-regularized objective incurred by using $\pi_{base}$ instead of $\pi^*$ is exactly the KL divergence $\Delta_t$.

Figures (17)

  • Figure 1: Comparison of different gating strategies for critical junction identification.
  • Figure 2: Performance of SIA across different intervention ratios compared with multiple baselines under the normal setting. Dashed horizontal lines denote static baselines, while solid curves show performance under increasing sparse intervention.
  • Figure 3: Performance of SIA across different intervention ratios under the weak-to-strong generation setting.
  • Figure 4: Alignment performance under varying weighting factor $\beta$. The case of $\beta = +\infty$ represents an extreme scenario where the reward distribution completely overrides the LLM's candidate token distribution.
  • Figure 5: Frequency distribution of entropy-gated junctions across token positions. Darker blue regions indicate higher identification frequency. The red dashed lines mark the average sequence length (against the preset maximum of 256 tokens), providing a reference for observing the relative distribution of junctions.
  • ...and 12 more figures

Theorems & Definitions (5)

  • Lemma 2.1: Stepwise Alignment Regret
  • proof
  • Theorem 2.2: Sparse Steering Error Bound
  • Proposition 2.3: Noise Reduction via Sparsity
  • proof