Table of Contents
Fetching ...

Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency

Xinyan Jiang, Wenjing Yu, Di Wang, Lijie Hu

Abstract

Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing methods deriving vectors from static activation differences are susceptible to high-dimensional noise and layer-wise semantic drift, often capturing spurious correlations rather than the target intent. To address this, we propose Global Evolutionary Refined Steering (GER-steer), a training-free framework that grounded in the geometric stability of the network's representation evolution. GER-steer exploits this global signal to rectify raw steering vectors, effectively decoupling robust semantic intent from orthogonal artifacts. Extensive evaluations confirm that GER-steer consistently outperforms baselines, delivering superior efficacy and generalization without layer-specific tuning, establishing a universal solution for reliable model alignment.

Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency

Abstract

Activation engineering enables precise control over Large Language Models (LLMs) without the computational cost of fine-tuning. However, existing methods deriving vectors from static activation differences are susceptible to high-dimensional noise and layer-wise semantic drift, often capturing spurious correlations rather than the target intent. To address this, we propose Global Evolutionary Refined Steering (GER-steer), a training-free framework that grounded in the geometric stability of the network's representation evolution. GER-steer exploits this global signal to rectify raw steering vectors, effectively decoupling robust semantic intent from orthogonal artifacts. Extensive evaluations confirm that GER-steer consistently outperforms baselines, delivering superior efficacy and generalization without layer-specific tuning, establishing a universal solution for reliable model alignment.
Paper Structure (75 sections, 4 theorems, 39 equations, 15 figures, 13 tables)

This paper contains 75 sections, 4 theorems, 39 equations, 15 figures, 13 tables.

Key Result

Theorem 3.2

Let $\hat{\mathbf{u}}_1$ be the top singular vector of the perturbed matrix $M$, and $\mathbf{u}^*$ be the ground-truth direction. Assuming the noise level is bounded within the high signal-to-noise ratio (SNR) regime, specifically satisfying the condition $2\|E\|_2 < \|\boldsymbol{\lambda}\|_2$, th where $\| E \|_2$ represents the spectral norm of the noise perturbation, and $\|\boldsymbol{\lambd

Figures (15)

  • Figure 1: Trajectory Consistency Analysis. The raw steering vector, typically estimated by subtracting the negative region activations from the positive ones, exhibits chaotic fluctuations across certain layers. This occurs because estimation noise compromises the directional consistency of the positive and negative trajectories (red arrow), causing local jitter vector obtained from the positive and negative difference to diverge sharply from the stable Global Evolutionary Direction. Such misalignment results in fail to drive the representation towards the positive target, ultimately causing the overall trajectory to deviate from the target morphology (blue dashed line). By aligning with the global consensus, the refined robust vector mitigates the local inconsistencies induced by such noise. Consequently, the refined trajectory successfully steers the test sample towards the target positive region.
  • Figure 2: Empirical Verification of Evolutionary Coherence. We perform Principal Component Analysis (PCA) on the tangent semantic vectors $\mathbf{g}_{l,i}$ aggregated from different layers. Observation: The first principal component (PC1) dominates the spectrum across all dataset. This spectral concentration reveals a substantial margin between the dominant signal and the residual variations, empirically validating the High SNR assumption ($2\|E\|_2 < \|\boldsymbol{\lambda}\|_2$) required for robust estimation.
  • Figure 3: Analysis of Steering Coefficients. Performance on Qwen2.5-7B vs. steering coefficient $\alpha$. GER-Steer (Red) exhibits sharper control and superior stability than the baseline (Blue). Additional results are shown in Appendix \ref{['app:Coefficients']}.
  • Figure 4: Hyperparameter Sensitivity Analysis on Qwen2.5-7B.Left: Impact of the rectification strength $\gamma$. Right: Impact of the number of steered layers $k$.
  • Figure 5: Utility Preservation on MMLU Benchmark. Impact of GER-Steer on general knowledge across 57 subjects. Colored bubbles (GER-Steer) consistently cluster around or above the baseline (gray)
  • ...and 10 more figures

Theorems & Definitions (8)

  • Theorem 3.2: Stability of Global Direction
  • Corollary 3.3: Asymptotic Consistency
  • Remark 3.4: Spectral Decoupling via Layer-wise Differences
  • Lemma 1.1: Weyl's Inequality weyl1912asymptotische
  • Lemma 1.2: Wedin's $\sin \Theta$ Theorem wedin1972perturbation
  • proof
  • proof
  • proof