Table of Contents
Fetching ...

Geometric-Disentangelment Unlearning

Duo Zhou, Yuji Zhang, Tianxin Wei, Ruizhong Qiu, Ke Yang, Xiao Lin, Cheng Qian, Jingrui He, Hanghang Tong, Heng Ji, Huan Zhang

TL;DR

The Geometric-disentanglement Unlearning (GU) is proposed, a plug-and-play method that decomposes any candidate forget gradient update into tangential and normal components to retain space and executes only the normal component.

Abstract

Machine unlearning, the removal of a training subset's influence from a deployed model, is critical for privacy preservation and model reliability, yet gradient ascent on forget samples often harms retained knowledge. Existing approaches face a persistent tradeoff between effective forgetting and preservation on the retain set. While previous methods provide useful heuristics, they often lack a formal analysis on how exactly forgetting updates harm retained knowledge, and whether the side effects can be removed with theoretical guarantees. To explore a theoretically sound and simple solution, we start from the first principle on how performance on the retain set is actually affected: a first-order analysis of the local change of the retain loss under small parameter updates during model training. We start from a crisp equivalence: the retain loss is unchanged to first order iff the update direction is orthogonal to the subspace spanned by retain gradients ("retain-invariant"). This identifies the entangled component as the tangential part of forget update within the retain-gradient subspace, and characterizes disentanglement as orthogonality. Guided by this, we propose the Geometric-disentanglement Unlearning (GU) that decomposes any candidate forget gradient update into tangential and normal components to retain space and executes only the normal component. Under a standard trust-region budget, the projected direction aligned with the raw forget gradient is optimal among all first-order retain-invariant moves, and we also derive the optimal projected direction for joint forget-retain updating objectives. Our method is plug-and-play and can be attached to existing gradient-based unlearning procedures to mitigate side effects. GU achieves consistent improvement on various methods across three benchmarks TOFU, MUSE, and WMDP.

Geometric-Disentangelment Unlearning

TL;DR

The Geometric-disentanglement Unlearning (GU) is proposed, a plug-and-play method that decomposes any candidate forget gradient update into tangential and normal components to retain space and executes only the normal component.

Abstract

Machine unlearning, the removal of a training subset's influence from a deployed model, is critical for privacy preservation and model reliability, yet gradient ascent on forget samples often harms retained knowledge. Existing approaches face a persistent tradeoff between effective forgetting and preservation on the retain set. While previous methods provide useful heuristics, they often lack a formal analysis on how exactly forgetting updates harm retained knowledge, and whether the side effects can be removed with theoretical guarantees. To explore a theoretically sound and simple solution, we start from the first principle on how performance on the retain set is actually affected: a first-order analysis of the local change of the retain loss under small parameter updates during model training. We start from a crisp equivalence: the retain loss is unchanged to first order iff the update direction is orthogonal to the subspace spanned by retain gradients ("retain-invariant"). This identifies the entangled component as the tangential part of forget update within the retain-gradient subspace, and characterizes disentanglement as orthogonality. Guided by this, we propose the Geometric-disentanglement Unlearning (GU) that decomposes any candidate forget gradient update into tangential and normal components to retain space and executes only the normal component. Under a standard trust-region budget, the projected direction aligned with the raw forget gradient is optimal among all first-order retain-invariant moves, and we also derive the optimal projected direction for joint forget-retain updating objectives. Our method is plug-and-play and can be attached to existing gradient-based unlearning procedures to mitigate side effects. GU achieves consistent improvement on various methods across three benchmarks TOFU, MUSE, and WMDP.

Paper Structure

This paper contains 62 sections, 7 theorems, 42 equations, 3 figures, 4 tables, 1 algorithm.

Key Result

Proposition 3.1

Formally, fix $\theta\in\mathbb{R}^p$ and an optimizer-induced symmetric positive definite metric $H\succ0$ with inner product $\langle u,v\rangle_H := u^\top H v$ and its norm is $\|v\|_H := \sqrt{\langle v,v\rangle_H}$. For each retain sample $x_r\in D_r$, assume $\ell_r(x_r;\theta)$ be differenti and its $H$-orthogonal complement $T_r(\theta)^\perp:=\{\,v\in\mathbb{R}^p:\ \langle v,\,g\rangle_H

Figures (3)

  • Figure 1: Geometric Unlearning (bottom) vs. baseline (top).$P_{\perp}$ is the $H$-orthogonal projector onto the complement of retain tangent subspace $T_r$; $P_{T_r}$ projects onto $T_r$. Without changing training objective or adding regularization, we route existing gradients through orthogonal projectors.
  • Figure 2: We visualize forgetting quality (ES Un: lower for better) against retained knowledge (ES Re), privacy (Priv), and model utility (MU) for eight unlearning baselines on TOFU. ES Re, Priv, and MU are metrics of higher for better. Circles denote baseline outputs, triangles denote results of GU, and arrows indicate the shift from Base $\rightarrow$ GU. Across all three panels, GU pushes methods toward the Pareto-optimal corner (upper-right), reducing the trade-off between forgetting and retaining.
  • Figure 3: We visualize forgetting quality (ES Un: lower for better) against retained knowledge (ROUGE Re), privacy (Priv Leak) for eight unlearning baselines on MUSE and Un. acc. vs mmlu. acc. on WMDP. ROUGE Re and mmlu. acc. are metrics of higher for better. Priv Leak is a metric closer to 0 for better.

Theorems & Definitions (14)

  • Proposition 3.1
  • Lemma 3.2: Steepest feasible descent under first-order safety
  • Proposition 3.3: First-order safety and retain monotonicity
  • Corollary 3.4: Descent guarantee for $L_r$ under $H$-smoothness
  • Proposition 3.5: Exact first-order change of $\mathcal{L}_{\text{joint}}$
  • Proposition C.1: Retain gradient subspace and $H$-orthogonality
  • proof
  • proof
  • proof
  • proof
  • ...and 4 more