Table of Contents
Fetching ...

Understanding Machine Unlearning Through the Lens of Mode Connectivity

Jiali Cheng, Hadi Amiri

TL;DR

This work introduces Mode Connectivity in Unlearning (MCU), a framework for analyzing how unlearning methods navigate the loss landscape between two unlearned minimizers derived from the same base model. By probing diverse training dynamics—curriculum learning and second-order optimization—and comparing different unlearning objectives across datasets (TOFU and MU-Bench), the study reveals that barrier-free, low-loss paths often exist, but their existence and smoothness depend on the forget-set size, task, and method. Importantly, MCU shows that a shared low-loss manifold does not guarantee uniform performance across evaluation metrics, highlighting mechanistic similarities while exposing metric-dependent differences. The results offer a diagnostic lens for unlearning methods, suggest scenarios where intermediate models along MCU can outperform endpoints, and point to how CL and SO can both help or hinder unlearning depending on the context. Overall, MCU provides actionable insights into the stability, interpretability, and design of robust unlearning strategies with potential for ensemble-style utilization of interpolated models.

Abstract

Machine Unlearning aims to remove undesired information from trained models without requiring full retraining from scratch. Despite recent advancements, their underlying loss landscapes and optimization dynamics received less attention. In this paper, we investigate and analyze machine unlearning through the lens of mode connectivity - the phenomenon where independently trained models can be connected by smooth low-loss paths in the parameter space. We define and study mode connectivity in unlearning across a range of overlooked conditions, including connections between different unlearning methods, models trained with and without curriculum learning, and models optimized with first-order and secondorder techniques. Our findings show distinct patterns of fluctuation of different evaluation metrics along the curve, as well as the mechanistic (dis)similarity between unlearning methods. To the best of our knowledge, this is the first study on mode connectivity in the context of machine unlearning.

Understanding Machine Unlearning Through the Lens of Mode Connectivity

TL;DR

This work introduces Mode Connectivity in Unlearning (MCU), a framework for analyzing how unlearning methods navigate the loss landscape between two unlearned minimizers derived from the same base model. By probing diverse training dynamics—curriculum learning and second-order optimization—and comparing different unlearning objectives across datasets (TOFU and MU-Bench), the study reveals that barrier-free, low-loss paths often exist, but their existence and smoothness depend on the forget-set size, task, and method. Importantly, MCU shows that a shared low-loss manifold does not guarantee uniform performance across evaluation metrics, highlighting mechanistic similarities while exposing metric-dependent differences. The results offer a diagnostic lens for unlearning methods, suggest scenarios where intermediate models along MCU can outperform endpoints, and point to how CL and SO can both help or hinder unlearning depending on the context. Overall, MCU provides actionable insights into the stability, interpretability, and design of robust unlearning strategies with potential for ensemble-style utilization of interpolated models.

Abstract

Machine Unlearning aims to remove undesired information from trained models without requiring full retraining from scratch. Despite recent advancements, their underlying loss landscapes and optimization dynamics received less attention. In this paper, we investigate and analyze machine unlearning through the lens of mode connectivity - the phenomenon where independently trained models can be connected by smooth low-loss paths in the parameter space. We define and study mode connectivity in unlearning across a range of overlooked conditions, including connections between different unlearning methods, models trained with and without curriculum learning, and models optimized with first-order and secondorder techniques. Our findings show distinct patterns of fluctuation of different evaluation metrics along the curve, as well as the mechanistic (dis)similarity between unlearning methods. To the best of our knowledge, this is the first study on mode connectivity in the context of machine unlearning.

Paper Structure

This paper contains 54 sections, 8 equations, 24 figures, 1 table.

Figures (24)

  • Figure 1: (a): Illustration of standard mode connectivity (MC): MC finds a smooth curve connecting two minimizers that yields consistent low loss on $D$. (b): Illustration of mode connectivity in unlearning (MCU): unlearning removes knowledge of forget set $D_f$ from the trained model $f_{\theta_o}$ while maintaining knowledge of retain set $D_r = D \setminus D_f$. MCU finds a smooth curve connecting the two unlearned models $\theta_1'$ and $\theta_2'$ that yields consistent low loss on $D_r$ and high loss on $D_f$. See details in § \ref{['sec:mcu']}.
  • Figure 2: MCU under Rand setting on TOFU dataset. Additional results are shown in Appendix \ref{['sec:additional_result']} Figure \ref{['fig:tofu-rand']}--\ref{['fig:tofu-fo-so']} for TOFU and Figure \ref{['fig:cls-rand']}--\ref{['fig:cls-fo-so']} for classification tasks.
  • Figure 3: MCU under Met setting on TOFU dataset. Methods on rows and columns correspond to $\theta_1'$ and $\theta_2'$ respectively. In (a), linear MCU is symmetric. In (b), Quadratic MCU is asymmetric as the curve is optimized using methods shown in rows. Additional results are shown in Appendix \ref{['sec:additional_result']}, Figures \ref{['fig:tofu-met']}--\ref{['fig:tofu-met-fo-so']} for TOFU and Figures \ref{['fig:cls-met']}--\ref{['fig:cls-met-fo-so']} for classification tasks.
  • Figure 4: MCU on DDI dataset.
  • Figure 5: MCU under Rand setting on TOFU dataset.
  • ...and 19 more figures

Theorems & Definitions (1)

  • Definition 1: Mode Connectivity in Unlearning (MCU)