TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models
Finn Carter
TL;DR
TRACE addresses the problem of unwanted concepts appearing in diffusion-model outputs by introducing a trajectory-constrained concept erasure framework. It combines a closed-form cross-attention edit with a trajectory-aware LoRA finetuning pipeline, enabling targeted removal while preserving unrelated content across both Stable Diffusion and Flux architectures. Theoretical results formalize exact erasure conditions and bound collateral effects, while empirical results across object, celebrity, artistic style, and NSFW tasks demonstrate state-of-the-art performance with minimal quality loss. This work provides a scalable, modular solution for safe deployment of diffusion models and lays groundwork for multi-concept erasure with per-concept adapters and integration regularization.
Abstract
Text-to-image diffusion models have shown unprecedented generative capability, but their ability to produce undesirable concepts (e.g.~pornographic content, sensitive identities, copyrighted styles) poses serious concerns for privacy, fairness, and safety. {Concept erasure} aims to remove or suppress specific concept information in a generative model. In this paper, we introduce \textbf{TRACE (Trajectory-Constrained Attentional Concept Erasure)}, a novel method to erase targeted concepts from diffusion models while preserving overall generative quality. Our approach combines a rigorous theoretical framework, establishing formal conditions under which a concept can be provably suppressed in the diffusion process, with an effective fine-tuning procedure compatible with both conventional latent diffusion (Stable Diffusion) and emerging rectified flow models (e.g.~FLUX). We first derive a closed-form update to the model's cross-attention layers that removes hidden representations of the target concept. We then introduce a trajectory-aware finetuning objective that steers the denoising process away from the concept only in the late sampling stages, thus maintaining the model's fidelity on unrelated content. Empirically, we evaluate TRACE on multiple benchmarks used in prior concept erasure studies (object classes, celebrity faces, artistic styles, and explicit content from the I2P dataset). TRACE achieves state-of-the-art performance, outperforming recent methods such as ANT, EraseAnything, and MACE in terms of removal efficacy and output quality.
