Machine Unlearning in Contrastive Learning
Zixin Wang, Kongyang Chen
TL;DR
The paper addresses privacy-centric data deletion by proposing a gradient-penalty-based approximate unlearning method applicable to both contrastive/self-supervised and supervised models. It builds a two-stage loss framework, starting with MEMtrain and a gradient-penalty term, then simplifies to MEMtrain+MEMGP to require only member-data gradients, enabling forgetting with minimal accuracy loss (~10%). The approach defends against membership inference attacks, demonstrates effectiveness across contrastive architectures (MoCo, SimCLR, BYOL) and ResNet supervision, and provides both encoder-focused analyses and visualization to validate unlearning. The method is simple to implement, framework-agnostic, and requires only a handful of training epochs, offering a practical path to regulatory-compliant data forgetting in modern AI systems.
Abstract
Machine unlearning is a complex process that necessitates the model to diminish the influence of the training data while keeping the loss of accuracy to a minimum. Despite the numerous studies on machine unlearning in recent years, the majority of them have primarily focused on supervised learning models, leaving research on contrastive learning models relatively underexplored. With the conviction that self-supervised learning harbors a promising potential, surpassing or rivaling that of supervised learning, we set out to investigate methods for machine unlearning centered around contrastive learning models. In this study, we introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our method only necessitates a minimal number of training epochs and the identification of the data slated for unlearning. Remarkably, our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models, showcasing its versatility and adaptability in various learning paradigms.
