Table of Contents
Fetching ...

Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning

Wenhan Chang, Tianqing Zhu, Heng Xu, Wenjian Liu, Wanlei Zhou

TL;DR

Privacy rights and costly retraining motivate the need for efficient unlearning of training data. The authors propose a concept-based unlearning pipeline that uses a Post-hoc Concept Bottleneck Model to locate influential concepts and applies data-poisoning-based fine-tuning to erase those concepts, demonstrated on image classifiers and large language models with minimal utility loss. Key contributions include introducing concept-based unlearning, integrating poisoning unlearning, and validating on CIFAR-10/100 and Vicuna models with strong forgetting while preserving utility; privacy analyses address both model- and data-level privacy. The work presents a scalable, targeted forgetting mechanism for complex data in practical privacy contexts, enabling safer deployment of AI systems without full retraining.

Abstract

In current AI era, users may request AI companies to delete their data from the training dataset due to the privacy concerns. As a model owner, retraining a model will consume significant computational resources. Therefore, machine unlearning is a new emerged technology to allow model owner to delete requested training data or a class with little affecting on the model performance. However, for large-scaling complex data, such as image or text data, unlearning a class from a model leads to a inferior performance due to the difficulty to identify the link between classes and model. An inaccurate class deleting may lead to over or under unlearning. In this paper, to accurately defining the unlearning class of complex data, we apply the definition of Concept, rather than an image feature or a token of text data, to represent the semantic information of unlearning class. This new representation can cut the link between the model and the class, leading to a complete erasing of the impact of a class. To analyze the impact of the concept of complex data, we adopt a Post-hoc Concept Bottleneck Model, and Integrated Gradients to precisely identify concepts across different classes. Next, we take advantage of data poisoning with random and targeted labels to propose unlearning methods. We test our methods on both image classification models and large language models (LLMs). The results consistently show that the proposed methods can accurately erase targeted information from models and can largely maintain the performance of the models.

Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning

TL;DR

Privacy rights and costly retraining motivate the need for efficient unlearning of training data. The authors propose a concept-based unlearning pipeline that uses a Post-hoc Concept Bottleneck Model to locate influential concepts and applies data-poisoning-based fine-tuning to erase those concepts, demonstrated on image classifiers and large language models with minimal utility loss. Key contributions include introducing concept-based unlearning, integrating poisoning unlearning, and validating on CIFAR-10/100 and Vicuna models with strong forgetting while preserving utility; privacy analyses address both model- and data-level privacy. The work presents a scalable, targeted forgetting mechanism for complex data in practical privacy contexts, enabling safer deployment of AI systems without full retraining.

Abstract

In current AI era, users may request AI companies to delete their data from the training dataset due to the privacy concerns. As a model owner, retraining a model will consume significant computational resources. Therefore, machine unlearning is a new emerged technology to allow model owner to delete requested training data or a class with little affecting on the model performance. However, for large-scaling complex data, such as image or text data, unlearning a class from a model leads to a inferior performance due to the difficulty to identify the link between classes and model. An inaccurate class deleting may lead to over or under unlearning. In this paper, to accurately defining the unlearning class of complex data, we apply the definition of Concept, rather than an image feature or a token of text data, to represent the semantic information of unlearning class. This new representation can cut the link between the model and the class, leading to a complete erasing of the impact of a class. To analyze the impact of the concept of complex data, we adopt a Post-hoc Concept Bottleneck Model, and Integrated Gradients to precisely identify concepts across different classes. Next, we take advantage of data poisoning with random and targeted labels to propose unlearning methods. We test our methods on both image classification models and large language models (LLMs). The results consistently show that the proposed methods can accurately erase targeted information from models and can largely maintain the performance of the models.
Paper Structure (41 sections, 20 equations, 9 figures, 7 tables)

This paper contains 41 sections, 20 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Unlearning concepts in different types of data. In Figure \ref{['intro_unlearning_image']}, we conducted a poisoning unlearning by exploiting the model’s confusion regarding the concept corresponding to a specific feature, thereby achieving unlearning. In Figure \ref{['intro_unlearning_text']}, we conducted a poisoning unlearning by masking the sensitive information identified within the data, and after fine-tuning, the LLMs would forget the original sensitive information.
  • Figure 2: Overview of our machine unlearning method. The process depicted in the figure begins with users initiating an unlearning request, with yellow arrows pointing towards the method flow, while blue arrows represent the specific operations.
  • Figure 3: Conducting data poisoning unlearning on the target unlearning data. We embed images with disruptive attributes as triggers into target class images, directly affecting their primary features, thereby achieving the objectives of data poisoning and machine unlearning.
  • Figure 4: The process of poisoning image data. After analyzing the importance of the concept of the target class data, we utilize the phenomenon of feature confusion to mask the main concept in the original data.
  • Figure 5: The process of poisoning language data. After analyzing the importance of the concepts in the target data, we mask the sensitive information for the following fine-tuning.
  • ...and 4 more figures