PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
Manjiang Yu, Hongji Li, Priyanka Singh, Xue Li, Di Wang, Lijie Hu
TL;DR
PIXEL introduces a tuning-free, position-wise activation steering framework for LLMs. It learns a robust, attribute-aligned subspace from dual views (tail-averaged and end-token) and derives a closed-form minimal intervention strength $oldsymbol{ ext{α}}^{*}_{ ext{ℓ,t}}(s)$ to achieve a target cosine similarity $s$, enabling per-position edits with minimal disruption. Orthogonal residual calibration further adapts the global direction to sample-specific semantics, while dynamic position scanning selects receptive injection sites. The approach provides representation-level guarantees on margins and generalization and demonstrates strong, transferable improvements across models and alignment tasks, while preserving general capabilities. Overall, PIXEL offers a principled, scalable method for reliable, multi-attribute controllable generation in LLMs.
Abstract
Reliable behavior control is central to deploying large language models (LLMs) on the web. Activation steering offers a tuning-free route to align attributes (e.g., truthfulness) that ensure trustworthy generation. Prevailing approaches rely on coarse heuristics and lack a principled account of where to steer and how strongly to intervene. To this end, we propose Position-wise Injection with eXact Estimated Levels (PIXEL), a position-wise activation steering framework that, in contrast to prior work, learns a property-aligned subspace from dual views (tail-averaged and end-token) and selects intervention strength via a constrained geometric objective with a closed-form solution, thereby adapting to token-level sensitivity without global hyperparameter tuning. PIXEL further performs sample-level orthogonal residual calibration to refine the global attribute direction and employs a lightweight position-scanning routine to identify receptive injection sites. We additionally provide representation-level guarantees for the minimal-intervention rule, supporting reliable alignment. Across diverse models and evaluation paradigms, PIXEL consistently improves attribute alignment while preserving model general capabilities, offering a practical and principled method for LLMs' controllable generation. Our code is available at https://github.com/V1centNevwake/PIXEL-Adaptive-Steering
