Table of Contents
Fetching ...

Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization

Yancheng Huang, Changsheng Wang, Chongyu Fan, Yicheng Lang, Bingqi Shang, Yang Zhang, Mingyi Hong, Qing Qu, Alvaro Velasquez, Sijia Liu

Abstract

Foundation models, such as large language models (LLMs), are powerful but often require customization before deployment to satisfy practical constraints such as safety, privacy, and task-specific requirements, leading to "constrained" optimization problems for model steering and adaptation. However, solving such problems remains largely underexplored and is particularly challenging due to interference between the primary objective and constraint objectives during optimization. In this paper, we propose a subspace control framework for constrained model training. Specifically, (i) we first analyze, from a model merging perspective, how spectral cross-task interference arises and show that it can be resolved via a one-shot solution that orthogonalizes the merged subspace; (ii) we establish a connection between this solution and gradient orthogonalization in the spectral optimizer Muon; and (iii) building on these insights, we introduce SIFT (spectral interference-free training), which leverages a localization scheme to selectively intervene during optimization, enabling controllable updates that mitigate objective-constraint conflicts. We evaluate SIFT across four representative applications: (a) machine unlearning, (b) safety alignment, (c) text-to-speech adaptation, and (d) hallucination mitigation. Compared to both control-based and control-free baselines, SIFT consistently achieves substantial and robust performance improvements across all tasks. Code is available at https://github.com/OPTML-Group/SIFT.

Subspace Control: Turning Constrained Model Steering into Controllable Spectral Optimization

Abstract

Foundation models, such as large language models (LLMs), are powerful but often require customization before deployment to satisfy practical constraints such as safety, privacy, and task-specific requirements, leading to "constrained" optimization problems for model steering and adaptation. However, solving such problems remains largely underexplored and is particularly challenging due to interference between the primary objective and constraint objectives during optimization. In this paper, we propose a subspace control framework for constrained model training. Specifically, (i) we first analyze, from a model merging perspective, how spectral cross-task interference arises and show that it can be resolved via a one-shot solution that orthogonalizes the merged subspace; (ii) we establish a connection between this solution and gradient orthogonalization in the spectral optimizer Muon; and (iii) building on these insights, we introduce SIFT (spectral interference-free training), which leverages a localization scheme to selectively intervene during optimization, enabling controllable updates that mitigate objective-constraint conflicts. We evaluate SIFT across four representative applications: (a) machine unlearning, (b) safety alignment, (c) text-to-speech adaptation, and (d) hallucination mitigation. Compared to both control-based and control-free baselines, SIFT consistently achieves substantial and robust performance improvements across all tasks. Code is available at https://github.com/OPTML-Group/SIFT.

Paper Structure

This paper contains 17 sections, 9 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Schematic overview of proposed subspace control framework, SIFT. (A) Performance across four model steering tasks (detailed in Tables \ref{['tab:fg_spec']} and \ref{['tab:setup']}), compared with the baseline BLUR reisizadeh2025blur. (B) When and where to control: SIFT enables selective intervention at targeted layers and training steps (i.e., spatial-temporal localization). (C) How to control: Built on spectral optimizer Muon, SIFT leverages gradient orthogonalization (the matrix sign function) to mitigate subspace interference.
  • Figure 2: Visualization of cosine similarity $\tau$ across optimization steps (temporal dimension) and model layers (spatial dimension) in LLM unlearning. The top and right marginal plots summarize the counts of $\tau < -0.1$ across steps and layers, respectively. Red stars $\star$ mark steps and layers need for control.
  • Figure 3: Utility loss under different descent directions starting from a model at step 35 of the unlearning process. Multiple update steps are performed along the full gradient, projected gradient, and removed component, respectively. The unlearning setup is the same as in Fig. \ref{['fig:motivation_gradient_alignment']}.
  • Figure 4: Sparse localization across applications. Temporal sparsity is the fraction of all training steps, and spatial sparsity is the fraction of all model components where SIFT is activated.
  • Figure 5: Sensitivity analysis of SIFT with respect to the top-$K$ subspace dimension. Left: effective rank of momentum matrices. Right: unlearning and utility performance under varying $K$.
  • ...and 5 more figures