Table of Contents
Fetching ...

Machine Unlearning in Generative AI: A Survey

Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang

TL;DR

This survey defines a principled framework for GenAI machine unlearning, introducing three objective metrics—Accuracy, Locality, and Generalizability—and categorizing methods into Parameter Optimization and In-Context Unlearning to cover generative image models, LLMs, and multimodal models. It provides a comprehensive taxonomy of techniques (gradient-based, knowledge distillation, data sharding, extra layers, task vectors, PEMs) and contextual unlearning (prompt-based and memory-augmented approaches), along with extensive datasets and benchmarks across safety, privacy, copyright, hallucination, and bias. The paper highlights key challenges such as copyright unlearning, knowledge entanglement, and the locality-generalizability trade-off, and outlines future directions including target-consistency, robust unlearning, and reliable evaluator practices. By linking methodology, evaluation, datasets, and applications, it offers a roadmap for advancing safe, privacy-preserving, and trustworthy GenAI systems.

Abstract

Generative AI technologies have been deployed in many places, such as (multimodal) large language models and vision generative models. Their remarkable performance should be attributed to massive training data and emergent reasoning abilities. However, the models would memorize and generate sensitive, biased, or dangerous information originated from the training data especially those from web crawl. New machine unlearning (MU) techniques are being developed to reduce or eliminate undesirable knowledge and its effects from the models, because those that were designed for traditional classification tasks could not be applied for Generative AI. We offer a comprehensive survey on many things about MU in Generative AI, such as a new problem formulation, evaluation methods, and a structured discussion on the advantages and limitations of different kinds of MU techniques. It also presents several critical challenges and promising directions in MU research. A curated list of readings can be found: https://github.com/franciscoliu/GenAI-MU-Reading.

Machine Unlearning in Generative AI: A Survey

TL;DR

This survey defines a principled framework for GenAI machine unlearning, introducing three objective metrics—Accuracy, Locality, and Generalizability—and categorizing methods into Parameter Optimization and In-Context Unlearning to cover generative image models, LLMs, and multimodal models. It provides a comprehensive taxonomy of techniques (gradient-based, knowledge distillation, data sharding, extra layers, task vectors, PEMs) and contextual unlearning (prompt-based and memory-augmented approaches), along with extensive datasets and benchmarks across safety, privacy, copyright, hallucination, and bias. The paper highlights key challenges such as copyright unlearning, knowledge entanglement, and the locality-generalizability trade-off, and outlines future directions including target-consistency, robust unlearning, and reliable evaluator practices. By linking methodology, evaluation, datasets, and applications, it offers a roadmap for advancing safe, privacy-preserving, and trustworthy GenAI systems.

Abstract

Generative AI technologies have been deployed in many places, such as (multimodal) large language models and vision generative models. Their remarkable performance should be attributed to massive training data and emergent reasoning abilities. However, the models would memorize and generate sensitive, biased, or dangerous information originated from the training data especially those from web crawl. New machine unlearning (MU) techniques are being developed to reduce or eliminate undesirable knowledge and its effects from the models, because those that were designed for traditional classification tasks could not be applied for Generative AI. We offer a comprehensive survey on many things about MU in Generative AI, such as a new problem formulation, evaluation methods, and a structured discussion on the advantages and limitations of different kinds of MU techniques. It also presents several critical challenges and promising directions in MU research. A curated list of readings can be found: https://github.com/franciscoliu/GenAI-MU-Reading.
Paper Structure (61 sections, 22 equations, 3 figures, 4 tables)

This paper contains 61 sections, 22 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Problems of contemporary Generative Models in various scenarios.
  • Figure 2: Overall assessment of three dimensions in the context of harmful unlearning for LLaMA2-7B touvron2023llama. Accuracy and generalizability metrics are calculated based on the model's safety rate after applying each approach (i.e. GA thudi2022unrolling, GA+Mismatch yao2023large, SKU liu2024towards, Task Vector ilharco2022editing). The preserved knowledge dimension is represented by Massive Multitask Language Understanding (MMLU) hendrycks2020measuring, which has been normalized to the same scale as accuracy and generalizability for consistent comparison.
  • Figure 3: Different types of Generative models.