Table of Contents
Fetching ...

Prompting Forgetting: Unlearning in GANs via Textual Guidance

Piyush Nagasubramaniam, Neeraj Karamchandani, Chen Wu, Sencun Zhu

TL;DR

This work addresses selective forgetting in GANs by proposing Text-to-Unlearn, a text-guided, cross-modal framework that unlearns concepts from a pre-trained GAN without access to the original data. It adopts a directional unlearning strategy that first computes a reference direction $\vec{i}$ in the CLIP embedding space using latent pairs, then fine-tunes a trainable generator $G_t$ along this direction with a frozen reference $G_f$ and a latent mapper $M_p$, guided by losses including $\mathcal{L}_{dir}$, $\mathcal{L}_{LPIPS}$, and $\mathcal{L}_{id}$. A key contribution is the degree of unlearning metric $\gamma$, based on Wasserstein-1 distance between image-text score distributions from state-of-the-art visual-language models, enabling scalable, quantitative evaluation of unlearning across in-domain and out-of-domain data. The results show effective, disentangled unlearning for features and expressions while preserving other capabilities, with a demonstrated robustness over baselines and a clear pathway for future debiasing of VLMs and extending to broader GAN families. The method offers a practical, dataset-free approach to model governance for high-resolution face generation and other sensitive content domains.

Abstract

State-of-the-art generative models exhibit powerful image-generation capabilities, introducing various ethical and legal challenges to service providers hosting these models. Consequently, Content Removal Techniques (CRTs) have emerged as a growing area of research to control outputs without full-scale retraining. Recent work has explored the use of Machine Unlearning in generative models to address content removal. However, the focus of such research has been on diffusion models, and unlearning in Generative Adversarial Networks (GANs) has remained largely unexplored. We address this gap by proposing Text-to-Unlearn, a novel framework that selectively unlearns concepts from pre-trained GANs using only text prompts, enabling feature unlearning, identity unlearning, and fine-grained tasks like expression and multi-attribute removal in models trained on human faces. Leveraging natural language descriptions, our approach guides the unlearning process without requiring additional datasets or supervised fine-tuning, offering a scalable and efficient solution. To evaluate its effectiveness, we introduce an automatic unlearning assessment method adapted from state-of-the-art image-text alignment metrics, providing a comprehensive analysis of the unlearning methodology. To our knowledge, Text-to-Unlearn is the first cross-modal unlearning framework for GANs, representing a flexible and efficient advancement in managing generative model behavior.

Prompting Forgetting: Unlearning in GANs via Textual Guidance

TL;DR

This work addresses selective forgetting in GANs by proposing Text-to-Unlearn, a text-guided, cross-modal framework that unlearns concepts from a pre-trained GAN without access to the original data. It adopts a directional unlearning strategy that first computes a reference direction in the CLIP embedding space using latent pairs, then fine-tunes a trainable generator along this direction with a frozen reference and a latent mapper , guided by losses including , , and . A key contribution is the degree of unlearning metric , based on Wasserstein-1 distance between image-text score distributions from state-of-the-art visual-language models, enabling scalable, quantitative evaluation of unlearning across in-domain and out-of-domain data. The results show effective, disentangled unlearning for features and expressions while preserving other capabilities, with a demonstrated robustness over baselines and a clear pathway for future debiasing of VLMs and extending to broader GAN families. The method offers a practical, dataset-free approach to model governance for high-resolution face generation and other sensitive content domains.

Abstract

State-of-the-art generative models exhibit powerful image-generation capabilities, introducing various ethical and legal challenges to service providers hosting these models. Consequently, Content Removal Techniques (CRTs) have emerged as a growing area of research to control outputs without full-scale retraining. Recent work has explored the use of Machine Unlearning in generative models to address content removal. However, the focus of such research has been on diffusion models, and unlearning in Generative Adversarial Networks (GANs) has remained largely unexplored. We address this gap by proposing Text-to-Unlearn, a novel framework that selectively unlearns concepts from pre-trained GANs using only text prompts, enabling feature unlearning, identity unlearning, and fine-grained tasks like expression and multi-attribute removal in models trained on human faces. Leveraging natural language descriptions, our approach guides the unlearning process without requiring additional datasets or supervised fine-tuning, offering a scalable and efficient solution. To evaluate its effectiveness, we introduce an automatic unlearning assessment method adapted from state-of-the-art image-text alignment metrics, providing a comprehensive analysis of the unlearning methodology. To our knowledge, Text-to-Unlearn is the first cross-modal unlearning framework for GANs, representing a flexible and efficient advancement in managing generative model behavior.

Paper Structure

This paper contains 20 sections, 8 equations, 12 figures, 5 tables, 3 algorithms.

Figures (12)

  • Figure 1: Examples of StyleCLIP manipulations of an image. The driving text for the edit is listed below each image.
  • Figure 2: Overview of the Text-to-Unlearn framework for unlearning the feature "purple hair" as an example. In the first phase, a reference direction to guide the unlearning is precomputed once. In the second phase, the precomputed reference direction is used to steer the trainable generator's synthesis network away from generating undesirable images.
  • Figure 3: Examples of image embeddings in the CLIP space during the fine-tuning of $G_{t}$. $\vec{i}$ is the precomputed reference direction and, $\vec{j_1}$ and $\vec{j_T}$ are alignments during and at the end of training, respectively.
  • Figure 4: Qualitative comparison of generated images before and after unlearning features based on the text prompt (below each grid). The second column corresponds to a latent code that has an undesirable feature and the third column is the image synthesized from the same latent code after unlearning.
  • Figure 5: Qualitative comparison of generated images before and after identity unlearning. The first column shows source samples from StyleGAN2. The second column shows images generated using the driving text (below each grid) on the source samples before unlearning. The third column shows the images for the same points after unlearning.
  • ...and 7 more figures