Table of Contents
Fetching ...

UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models

Yihua Zhang, Chongyu Fan, Yimeng Zhang, Yuguang Yao, Jinghan Jia, Jiancheng Liu, Gaoyuan Zhang, Gaowen Liu, Ramana Rao Kompella, Xiaoming Liu, Sijia Liu

TL;DR

UnlearnCanvas provides a ground-truth–driven benchmark and dataset for evaluating diffusion-model unlearning of artistic styles and associated objects, addressing critical gaps in existing MU evaluations. The authors introduce a dual-supervised, high-resolution dataset (60 styles × 20 objects) and a 7-metric evaluation framework, enabling systematic assessment of unlearning effectiveness, retainability, generation quality, and efficiency. Benchmarking nine state-of-the-art MU methods reveals that no method is best across all metrics, with pronounced gaps in cross-domain retainability and robustness to adversarial prompts; sequential unlearning exhibits rebound and catastrophic retaining failures. The work demonstrates UnlearnCanvas’s utility beyond MU, e.g., for style transfer benchmarking and bias-mitigation studies, and emphasizes the need for rigorous, standardized evaluation to guide the development of safer, more robust diffusion-model unlearning techniques.

Abstract

The technological advancements in diffusion models (DMs) have demonstrated unprecedented capabilities in text-to-image generation and are widely used in diverse applications. However, they have also raised significant societal concerns, such as the generation of harmful content and copyright disputes. Machine unlearning (MU) has emerged as a promising solution, capable of removing undesired generative capabilities from DMs. However, existing MU evaluation systems present several key challenges that can result in incomplete and inaccurate assessments. To address these issues, we propose UnlearnCanvas, a comprehensive high-resolution stylized image dataset that facilitates the evaluation of the unlearning of artistic styles and associated objects. This dataset enables the establishment of a standardized, automated evaluation framework with 7 quantitative metrics assessing various aspects of the unlearning performance for DMs. Through extensive experiments, we benchmark 9 state-of-the-art MU methods for DMs, revealing novel insights into their strengths, weaknesses, and underlying mechanisms. Additionally, we explore challenging unlearning scenarios for DMs to evaluate worst-case performance against adversarial prompts, the unlearning of finer-scale concepts, and sequential unlearning. We hope that this study can pave the way for developing more effective, accurate, and robust DM unlearning methods, ensuring safer and more ethical applications of DMs in the future. The dataset, benchmark, and codes are publicly available at https://unlearn-canvas.netlify.app/.

UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models

TL;DR

UnlearnCanvas provides a ground-truth–driven benchmark and dataset for evaluating diffusion-model unlearning of artistic styles and associated objects, addressing critical gaps in existing MU evaluations. The authors introduce a dual-supervised, high-resolution dataset (60 styles × 20 objects) and a 7-metric evaluation framework, enabling systematic assessment of unlearning effectiveness, retainability, generation quality, and efficiency. Benchmarking nine state-of-the-art MU methods reveals that no method is best across all metrics, with pronounced gaps in cross-domain retainability and robustness to adversarial prompts; sequential unlearning exhibits rebound and catastrophic retaining failures. The work demonstrates UnlearnCanvas’s utility beyond MU, e.g., for style transfer benchmarking and bias-mitigation studies, and emphasizes the need for rigorous, standardized evaluation to guide the development of safer, more robust diffusion-model unlearning techniques.

Abstract

The technological advancements in diffusion models (DMs) have demonstrated unprecedented capabilities in text-to-image generation and are widely used in diverse applications. However, they have also raised significant societal concerns, such as the generation of harmful content and copyright disputes. Machine unlearning (MU) has emerged as a promising solution, capable of removing undesired generative capabilities from DMs. However, existing MU evaluation systems present several key challenges that can result in incomplete and inaccurate assessments. To address these issues, we propose UnlearnCanvas, a comprehensive high-resolution stylized image dataset that facilitates the evaluation of the unlearning of artistic styles and associated objects. This dataset enables the establishment of a standardized, automated evaluation framework with 7 quantitative metrics assessing various aspects of the unlearning performance for DMs. Through extensive experiments, we benchmark 9 state-of-the-art MU methods for DMs, revealing novel insights into their strengths, weaknesses, and underlying mechanisms. Additionally, we explore challenging unlearning scenarios for DMs to evaluate worst-case performance against adversarial prompts, the unlearning of finer-scale concepts, and sequential unlearning. We hope that this study can pave the way for developing more effective, accurate, and robust DM unlearning methods, ensuring safer and more ethical applications of DMs in the future. The dataset, benchmark, and codes are publicly available at https://unlearn-canvas.netlify.app/.
Paper Structure (54 sections, 22 figures, 9 tables)

This paper contains 54 sections, 22 figures, 9 tables.

Figures (22)

  • Figure 1: (a) An illustration of MU for DMs. (b) Overview of experiment settings and benchmark results. This benchmark focuses on three categories of quantitative metrics: the unlearning effectiveness (UA, Rob., FU, SU); the retainability of innocent knowledge (IRA, CRA, FR, SR); and the image generation quality (FID). Results are normalized to $0\%\sim100\%$ per metric. No single method excels across all metrics. See a summary of these metrics in Tab. \ref{['tab: metrics_summary']} and more results in Sec. \ref{['sec: mu_experiment_results']}.
  • Figure 2: An illustration of machine unlearning using UnlearnCanvas. Concepts in the knowledge bank are categorized into different domains (style and object) and serve as potential unlearning targets. When one concept is unlearned, the rest concepts in both the same and different domains are required to be retained.
  • Figure 3: Illustration of curating UnlearnCanvas.
  • Figure 4: Illustration of in-domain and cross-domain retainability evaluation, with the Van Gogh style as the unlearning target. ✓ and ✗ indicate satisfactory and undesired results post unlearning.
  • Figure 5: An illustration of the evaluation pipeline proposed in this work using UnlearnCanvas when unlearning a specific target concept 'Van Gogh Style'. Unlearning performances (including the unlearning effectiveness and retainability) are quantitatively assessed (marked in blue) to accurately reflect the unlearning performance portrait. The unlearning target of the pipeline could traverse all the styles and objects to achieve a comprehensive evaluation.
  • ...and 17 more figures