Table of Contents
Fetching ...

SineProject: Machine Unlearning for Stable Vision Language Alignment

Arpit Garg, Hemanth Saratchandran, Simon Lucey

TL;DR

This work identifies cross-modal projection geometry as the primary bottleneck hampering safe and effective unlearning in multimodal LLMs. It introduces SineProject, a simple, architecture-agnostic method that bounds projection perturbations via sinusoidal modulation of the frozen projector, yielding a better-conditioned Jacobian and stable vision-language alignment during forgetting. Theoretical analysis shows bounded Jacobian behavior under sinusoidal reparameterization, and extensive experiments on SafeEraser and MLLMU-Bench demonstrate improved forget-retain trade-offs, reduced inappropriate refusals, and minimal computational overhead across diverse architectures. The approach provides a practical pathway toward reliable, scalable unlearning in multimodal systems with real-world safety and privacy implications.

Abstract

Multimodal Large Language Models (MLLMs) increasingly need to forget specific knowledge such as unsafe or private information without requiring full retraining. However, existing unlearning methods often disrupt vision language alignment, causing models to reject both harmful and benign queries. We trace this failure to the projector network during unlearning, its Jacobian becomes severely illconditioned, leading to unstable optimization and drift in cross modal embeddings. We introduce SineProject, a simple method that augments the frozen projector with sinusoidally modulated trainable parameters, improving the Jacobian's spectral conditioning and stabilizing alignment throughout unlearning. Across standard safety and privacy unlearning benchmarks using LLaVA v1.5 7B and 13B, SineProject reduces benign query refusals while achieving complete forgetting of targeted information, yielding state of the art forget retain trade offs with negligible computational overhead.

SineProject: Machine Unlearning for Stable Vision Language Alignment

TL;DR

This work identifies cross-modal projection geometry as the primary bottleneck hampering safe and effective unlearning in multimodal LLMs. It introduces SineProject, a simple, architecture-agnostic method that bounds projection perturbations via sinusoidal modulation of the frozen projector, yielding a better-conditioned Jacobian and stable vision-language alignment during forgetting. Theoretical analysis shows bounded Jacobian behavior under sinusoidal reparameterization, and extensive experiments on SafeEraser and MLLMU-Bench demonstrate improved forget-retain trade-offs, reduced inappropriate refusals, and minimal computational overhead across diverse architectures. The approach provides a practical pathway toward reliable, scalable unlearning in multimodal systems with real-world safety and privacy implications.

Abstract

Multimodal Large Language Models (MLLMs) increasingly need to forget specific knowledge such as unsafe or private information without requiring full retraining. However, existing unlearning methods often disrupt vision language alignment, causing models to reject both harmful and benign queries. We trace this failure to the projector network during unlearning, its Jacobian becomes severely illconditioned, leading to unstable optimization and drift in cross modal embeddings. We introduce SineProject, a simple method that augments the frozen projector with sinusoidally modulated trainable parameters, improving the Jacobian's spectral conditioning and stabilizing alignment throughout unlearning. Across standard safety and privacy unlearning benchmarks using LLaVA v1.5 7B and 13B, SineProject reduces benign query refusals while achieving complete forgetting of targeted information, yielding state of the art forget retain trade offs with negligible computational overhead.

Paper Structure

This paper contains 44 sections, 3 theorems, 56 equations, 8 figures, 16 tables.

Key Result

Theorem 3.1

Let Let $\nabla F$ denote the Jacobian of $F$ with respect to the parameters $(W_1, b_1, W_2, b_2)$. The sine projector network is defined as where $\sin(\cdot)$ denotes the element-wise sine applied to each matrix element. Let $\nabla G$ denote the Jacobian of $G$ with respect to the parameter set $(W_1, b_1, W_2, b_2)$. We then have: Consequently, as the magnitudes of $W_1$ and $W_2$ increase

Figures (8)

  • Figure 1: Vision–language alignment degrades during unlearning but is preserved by SineProject: This figure shows the cosine similarity matrices between the projected vision features ($\mathbf{h}_i$, rows) and text embeddings ($\mathbf{t}_j$, columns) on 100 matched image-caption pairs, where $(i,j) = \cos(\mathbf{h}_i, \mathbf{t}_j)$. The strong diagonal (red) indicates correct pairing, and the off-diagonal red indicates spurious correlations. Both methods start from the same pretrained model with clear diagonal alignment (Epoch 1). After seven epochs, SafeEraser chen2025safeeraser exhibited diagonal degradation and increased off-diagonal noise, whereas SineProject preserved the alignment structure and the multimodal coherence.
  • Figure 2: Geometric stability across unlearning epochs.(a) Stability of the first projection layer during unlearning. Sineproject (blue) maintains stable conditioning, whereas SafeEraser (red) degrades moderately. (b) Stability of the second projection layer. SafeEraser exhibited severe instability ($> 10^6$), whereas Sineproject remained well conditioned ($< 10^3$). (c) Modality Integration Rate (MIR). Shaded region indicates optimal range $[2.5, 3.0]$. Sineproject converges within this regime; SafeEraser diverges to MIR $> 4.5$, indicating alignment drift.
  • Figure 3: Spectral dynamics during unlearning. Evolution of singular values for $W_1$ (solid) and $W_2$ (dashed) across seven epochs. GD, GD+PD, and PO+PD are SafeEraser baselines; SineProject extends PO+PD with sinusoidal modulation. (a) Maximum singular values $\sigma_{\max}$ (computed via Lanczos bidiagonalization lanczos1950iteration). Lower values indicate bounded update. (b) Minimum singular values $\sigma_{\min}$ (via eigendecomposition trefethen2022numerical). Higher values indicate better matrix conditioning. SineProject maintains stable $\sigma_{\max}$ and $\sigma_{\min}$, achieving 2–4 orders of magnitude better conditioning than the baselines. Ablations in \ref{['supp:ablations']}.
  • Figure 4: Robustness to modulation strength $\alpha$ in $\sin(\alpha \cdot \Delta W)$. (a) SARR remains stable across $\alpha \in [1, 300]$ with variation $<0.3\%$, all variants significantly outperforming baseline (horizontal dashed line at 30.3%). (b) All metrics normalized to $\alpha=1$ baseline show variation within $\pm1\%$, demonstrating that SineProject's benefits arise from bounded transformation rather than hyperparameter tuning. Shaded regions indicate $\pm1\sigma$ across three seeds.
  • Figure 5: Initialization sensitivity across 10 random seeds for projection weight initialization. (a) SARR distribution (violin plots) shows SineProject achieved 74% lower variance (std: 0.15% vs 0.58%), with tighter clustering around the median. (b) Jacobian condition number Jaccobian Conditioning $(W_2)$ remains stable for SineProject (mean: $5.4 \times 10^2$, std: $3.2 \times 10^1$) while baseline exhibits high variance (mean: $1.01 \times 10^6$, std: $6.8 \times 10^4$). (c) Coefficient of variation across all metrics demonstrates consistent variance reduction, validating robustness to initialization.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 3.1
  • Proposition B.1
  • proof
  • Proposition B.2
  • proof
  • proof : Proof of \ref{['thm:sine_better_cond']}