FADE: Selective Forgetting via Sparse LoRA and Self-Distillation
Carolina R. Kelsch, Leonardo S. B. Pereira, Natnael Mola, Luis H. Arribas, Juan C. S. M. Avedillo
TL;DR
This work tackles selective unlearning in text-to-image diffusion models under regulatory and safety constraints. FADE introduces a two-stage approach: first, a knowledge-location step uses gradient-based saliency to confine updates to a sparse set of parameters via LoRA adapters; second, a self-distillation stage overwrites the forgotten concept with a user-defined surrogate, guided by a specialized loss and conditional prompts. The adapters are memory-efficient, mergeable at inference time, and enable reversible deployment, yielding strong forgetting with high retainability on benchmarks like UnlearnCanvas, supported by ablations on multiple datasets. Overall, FADE offers a practical, controllable, and scalable solution for selective unlearning in diffusion-based image generation with broad production applicability.
Abstract
Machine Unlearning aims to remove the influence of specific data or concepts from trained models while preserving overall performance, a capability increasingly required by data protection regulations and responsible AI practices. Despite recent progress, unlearning in text-to-image diffusion models remains challenging due to high computational costs and the difficulty of balancing effective forgetting with retention of unrelated concepts. We introduce FADE (Fast Adapter for Data Erasure), a two-stage unlearning method for image generation that combines parameter localization with self-distillation. FADE first identifies parameters most responsible for the forget set using gradient-based saliency and constrains updates through sparse LoRA adapters, ensuring lightweight, localized modifications. In a second stage, FADE applies a self-distillation objective that overwrites the forgotten concept with a user-defined surrogate while preserving behavior on retained data. The resulting adapters are memory-efficient, reversible, and can be merged or removed at runtime, enabling flexible deployment in production systems. We evaluated FADE on the UnlearnCanvas benchmark and conducted ablation studies on Imagenette, Labeled Faces in the Wild, AtharvaTaras Dog Breeds Dataset, and SUN Attributes datasets, demonstrating State-of-the-Art unlearning performance with fine-grained control over the forgetting-retention trade-off. Our results demonstrate that FADE achieves strong concept erasure and high retainability across various domains, making it a suitable solution for selective unlearning in diffusion-based image generation models.
