Table of Contents
Fetching ...

MeGU: Machine-Guided Unlearning with Target Feature Disentanglement

Haoyu Wang, Zhuo Huang, Xiaolong Wang, Bo Han, Zhiwei Lin, Tongliang Liu

TL;DR

The proposed Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment through Multi-modal Large Language Models, which enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.

Abstract

The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned during model pretraining are analyzed. It is demonstrated that semantic class concepts are entangled at the feature-pattern level, sharing associated features while preserving concept-specific discriminative components. This entanglement fundamentally limits the effectiveness of existing unlearning paradigms. Motivated by this insight, we propose Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment. Specifically, Multi-modal Large Language Models (MLLMs) are leveraged to explicitly determine re-alignment directions for target samples by assigning semantically meaningful perturbing labels. To improve efficiency, inter-class conceptual similarities estimated by the MLLM are encoded into a lightweight transition matrix. Furthermore, MeGU introduces a positive-negative feature noise pair to explicitly disentangle target concept influence. During finetuning, the negative noise suppresses target-specific feature patterns, while the positive noise reinforces remaining associated features and aligns them with perturbing concepts. This coordinated design enables selective disruption of target-specific representations while preserving shared semantic structures. As a result, MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.

MeGU: Machine-Guided Unlearning with Target Feature Disentanglement

TL;DR

The proposed Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment through Multi-modal Large Language Models, which enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.

Abstract

The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned during model pretraining are analyzed. It is demonstrated that semantic class concepts are entangled at the feature-pattern level, sharing associated features while preserving concept-specific discriminative components. This entanglement fundamentally limits the effectiveness of existing unlearning paradigms. Motivated by this insight, we propose Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment. Specifically, Multi-modal Large Language Models (MLLMs) are leveraged to explicitly determine re-alignment directions for target samples by assigning semantically meaningful perturbing labels. To improve efficiency, inter-class conceptual similarities estimated by the MLLM are encoded into a lightweight transition matrix. Furthermore, MeGU introduces a positive-negative feature noise pair to explicitly disentangle target concept influence. During finetuning, the negative noise suppresses target-specific feature patterns, while the positive noise reinforces remaining associated features and aligns them with perturbing concepts. This coordinated design enables selective disruption of target-specific representations while preserving shared semantic structures. As a result, MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
Paper Structure (35 sections, 12 equations, 8 figures, 8 tables, 2 algorithms)

This paper contains 35 sections, 12 equations, 8 figures, 8 tables, 2 algorithms.

Figures (8)

  • Figure 1: The entanglement among features from different concepts. Taking dinosaur and wolf as an example. They share similar features (marked as red) while each possesses unique features (green). Assume that dinosaur is the class to be forgotten. The goal of our method is to disentangle its features and forget the target concept's unique features while preserving associated features shared with the other retained concepts.
  • Figure 2: The proposed unlearning framework MeGU. The MLLM is employed to acquire the conceptual similarities with a small subset of the training data. Incorporated with model prediction, the perturbing labels are determined. The Fragment-Align strategy leverages a pair of feature noises trained from the frozen pretrained model to respectively restrain unique features of the target data while enhancing those associated with the retained data. The unlearning is achieved via aligning the target data towards perturbing labels while disentangling their influence.
  • Figure 3: The visualized transition matrix.
  • Figure 4: Time consumed for CIFAR-100 class-wise unlearning.
  • Figure 5: The distribution of perturbing labels for class-wise unlearning designating class 2 as target on CIFAR-10 with ResNet18.
  • ...and 3 more figures