Understanding Machine Unlearning Through the Lens of Mode Connectivity
Jiali Cheng, Hadi Amiri
TL;DR
This work introduces Mode Connectivity in Unlearning (MCU), a framework for analyzing how unlearning methods navigate the loss landscape between two unlearned minimizers derived from the same base model. By probing diverse training dynamics—curriculum learning and second-order optimization—and comparing different unlearning objectives across datasets (TOFU and MU-Bench), the study reveals that barrier-free, low-loss paths often exist, but their existence and smoothness depend on the forget-set size, task, and method. Importantly, MCU shows that a shared low-loss manifold does not guarantee uniform performance across evaluation metrics, highlighting mechanistic similarities while exposing metric-dependent differences. The results offer a diagnostic lens for unlearning methods, suggest scenarios where intermediate models along MCU can outperform endpoints, and point to how CL and SO can both help or hinder unlearning depending on the context. Overall, MCU provides actionable insights into the stability, interpretability, and design of robust unlearning strategies with potential for ensemble-style utilization of interpolated models.
Abstract
Machine Unlearning aims to remove undesired information from trained models without requiring full retraining from scratch. Despite recent advancements, their underlying loss landscapes and optimization dynamics received less attention. In this paper, we investigate and analyze machine unlearning through the lens of mode connectivity - the phenomenon where independently trained models can be connected by smooth low-loss paths in the parameter space. We define and study mode connectivity in unlearning across a range of overlooked conditions, including connections between different unlearning methods, models trained with and without curriculum learning, and models optimized with first-order and secondorder techniques. Our findings show distinct patterns of fluctuation of different evaluation metrics along the curve, as well as the mechanistic (dis)similarity between unlearning methods. To the best of our knowledge, this is the first study on mode connectivity in the context of machine unlearning.
