Prompting Forgetting: Unlearning in GANs via Textual Guidance
Piyush Nagasubramaniam, Neeraj Karamchandani, Chen Wu, Sencun Zhu
TL;DR
This work addresses selective forgetting in GANs by proposing Text-to-Unlearn, a text-guided, cross-modal framework that unlearns concepts from a pre-trained GAN without access to the original data. It adopts a directional unlearning strategy that first computes a reference direction $\vec{i}$ in the CLIP embedding space using latent pairs, then fine-tunes a trainable generator $G_t$ along this direction with a frozen reference $G_f$ and a latent mapper $M_p$, guided by losses including $\mathcal{L}_{dir}$, $\mathcal{L}_{LPIPS}$, and $\mathcal{L}_{id}$. A key contribution is the degree of unlearning metric $\gamma$, based on Wasserstein-1 distance between image-text score distributions from state-of-the-art visual-language models, enabling scalable, quantitative evaluation of unlearning across in-domain and out-of-domain data. The results show effective, disentangled unlearning for features and expressions while preserving other capabilities, with a demonstrated robustness over baselines and a clear pathway for future debiasing of VLMs and extending to broader GAN families. The method offers a practical, dataset-free approach to model governance for high-resolution face generation and other sensitive content domains.
Abstract
State-of-the-art generative models exhibit powerful image-generation capabilities, introducing various ethical and legal challenges to service providers hosting these models. Consequently, Content Removal Techniques (CRTs) have emerged as a growing area of research to control outputs without full-scale retraining. Recent work has explored the use of Machine Unlearning in generative models to address content removal. However, the focus of such research has been on diffusion models, and unlearning in Generative Adversarial Networks (GANs) has remained largely unexplored. We address this gap by proposing Text-to-Unlearn, a novel framework that selectively unlearns concepts from pre-trained GANs using only text prompts, enabling feature unlearning, identity unlearning, and fine-grained tasks like expression and multi-attribute removal in models trained on human faces. Leveraging natural language descriptions, our approach guides the unlearning process without requiring additional datasets or supervised fine-tuning, offering a scalable and efficient solution. To evaluate its effectiveness, we introduce an automatic unlearning assessment method adapted from state-of-the-art image-text alignment metrics, providing a comprehensive analysis of the unlearning methodology. To our knowledge, Text-to-Unlearn is the first cross-modal unlearning framework for GANs, representing a flexible and efficient advancement in managing generative model behavior.
