Improving Unlearning with Model Updates Probably Aligned with Gradients

Virgile Dine; Teddy Furon; Charly Faure

Improving Unlearning with Model Updates Probably Aligned with Gradients

Virgile Dine, Teddy Furon, Charly Faure

TL;DR

This work reframes machine unlearning as a constrained optimization problem that balances forgetting protected data with preserving utility. It introduces feasible updates based on masking and batch-gradient noise modeling, forming add-ons that can be plugged into existing first-order unlearning methods. The authors provide a theoretical foundation via KKT conditions and a probabilistic masking framework, including a focus vector to guide updates under gradient uncertainty. Extensive experiments across 360 configurations on CV tasks demonstrate improved privacy metrics such as MIA and relative unlearning accuracy with manageable trade-offs in accuracy and runtime. The proposed approach offers a practical, principled way to enhance unlearning methods without retraining from scratch, with broad applicability to privacy, robustness, and fairness concerns in real-world systems.

Abstract

We formulate the machine unlearning problem as a general constrained optimization problem. It unifies the first-order methods from the approximate machine unlearning literature. This paper then introduces the concept of feasible updates as the model's parameter update directions that help with unlearning while not degrading the utility of the initial model. Our design of feasible updates is based on masking, \ie\ a careful selection of the model's parameters worth updating. It also takes into account the estimation noise of the gradients when processing each batch of data to offer a statistical guarantee to derive locally feasible updates. The technique can be plugged in, as an add-on, to any first-order approximate unlearning methods. Experiments with computer vision classifiers validate this approach.

Improving Unlearning with Model Updates Probably Aligned with Gradients

TL;DR

Abstract

Improving Unlearning with Model Updates Probably Aligned with Gradients

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (15)