Table of Contents
Fetching ...

Machine Unlearning in Contrastive Learning

Zixin Wang, Kongyang Chen

TL;DR

The paper addresses privacy-centric data deletion by proposing a gradient-penalty-based approximate unlearning method applicable to both contrastive/self-supervised and supervised models. It builds a two-stage loss framework, starting with MEMtrain and a gradient-penalty term, then simplifies to MEMtrain+MEMGP to require only member-data gradients, enabling forgetting with minimal accuracy loss (~10%). The approach defends against membership inference attacks, demonstrates effectiveness across contrastive architectures (MoCo, SimCLR, BYOL) and ResNet supervision, and provides both encoder-focused analyses and visualization to validate unlearning. The method is simple to implement, framework-agnostic, and requires only a handful of training epochs, offering a practical path to regulatory-compliant data forgetting in modern AI systems.

Abstract

Machine unlearning is a complex process that necessitates the model to diminish the influence of the training data while keeping the loss of accuracy to a minimum. Despite the numerous studies on machine unlearning in recent years, the majority of them have primarily focused on supervised learning models, leaving research on contrastive learning models relatively underexplored. With the conviction that self-supervised learning harbors a promising potential, surpassing or rivaling that of supervised learning, we set out to investigate methods for machine unlearning centered around contrastive learning models. In this study, we introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our method only necessitates a minimal number of training epochs and the identification of the data slated for unlearning. Remarkably, our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models, showcasing its versatility and adaptability in various learning paradigms.

Machine Unlearning in Contrastive Learning

TL;DR

The paper addresses privacy-centric data deletion by proposing a gradient-penalty-based approximate unlearning method applicable to both contrastive/self-supervised and supervised models. It builds a two-stage loss framework, starting with MEMtrain and a gradient-penalty term, then simplifies to MEMtrain+MEMGP to require only member-data gradients, enabling forgetting with minimal accuracy loss (~10%). The approach defends against membership inference attacks, demonstrates effectiveness across contrastive architectures (MoCo, SimCLR, BYOL) and ResNet supervision, and provides both encoder-focused analyses and visualization to validate unlearning. The method is simple to implement, framework-agnostic, and requires only a handful of training epochs, offering a practical path to regulatory-compliant data forgetting in modern AI systems.

Abstract

Machine unlearning is a complex process that necessitates the model to diminish the influence of the training data while keeping the loss of accuracy to a minimum. Despite the numerous studies on machine unlearning in recent years, the majority of them have primarily focused on supervised learning models, leaving research on contrastive learning models relatively underexplored. With the conviction that self-supervised learning harbors a promising potential, surpassing or rivaling that of supervised learning, we set out to investigate methods for machine unlearning centered around contrastive learning models. In this study, we introduce a novel gradient constraint-based approach for training the model to effectively achieve machine unlearning. Our method only necessitates a minimal number of training epochs and the identification of the data slated for unlearning. Remarkably, our approach demonstrates proficient performance not only on contrastive learning models but also on supervised learning models, showcasing its versatility and adaptability in various learning paradigms.
Paper Structure (11 sections, 3 equations, 9 figures, 7 tables, 2 algorithms)

This paper contains 11 sections, 3 equations, 9 figures, 7 tables, 2 algorithms.

Figures (9)

  • Figure 1: Use gradient penalty before
  • Figure 2: Use gradient penalty after
  • Figure 3: The graph represents the shape of the distribution of the predicted probabilities of the data for models with different degrees of overfitting
  • Figure 4: The graph represents the change in cosine similarity between the training and non-training data in the later stages before performing machine unlearning
  • Figure 5: This figure shows the change in loss of a batch of training and non-training data before and after performing machine unlearning
  • ...and 4 more figures