Table of Contents
Fetching ...

CASteer: Steering Diffusion Models for Controllable Generation

Tatiana Gaintseva, Andreea-Maria Oncescu, Chengcheng Ma, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, Ismail Elezi

TL;DR

This work introduces CASteer, a training-free framework for controllable concept erasure in diffusion models. It builds concept-specific steering vectors from paired positive/negative prompts and applies them to cross-attention outputs during inference, enabling selective suppression of unwanted concepts while preserving overall image quality. By targeting CA layers and leveraging a projection-based suppression scheme, CASteer achieves state-of-the-art erasure across abstract and concrete concepts and across diverse backbones (SD-1.4, SDXL, SANA) and distilled variants, with extensions to multiple concepts and implicit prompts. The approach offers a practical, scalable tool for safety and content control in generative imaging, with clear pathways for integration and future theoretical grounding.

Abstract

Diffusion models have transformed image generation, yet controlling their outputs to reliably erase undesired concepts remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering), a training-free framework for concept erasure in diffusion models using steering vectors to influence hidden representations dynamically. CASteer precomputes concept-specific steering vectors by averaging neural activations from images generated for each target concept. During inference, it dynamically applies these vectors to suppress undesired concepts only when they appear, ensuring that unrelated regions remain unaffected. This selective activation enables precise, context-aware erasure without degrading overall image quality. This approach achieves effective removal of harmful or unwanted content across a wide range of visual concepts, all without model retraining. CASteer outperforms state-of-the-art concept erasure techniques while preserving unrelated content and minimizing unintended effects. Pseudocode is provided in the supplementary.

CASteer: Steering Diffusion Models for Controllable Generation

TL;DR

This work introduces CASteer, a training-free framework for controllable concept erasure in diffusion models. It builds concept-specific steering vectors from paired positive/negative prompts and applies them to cross-attention outputs during inference, enabling selective suppression of unwanted concepts while preserving overall image quality. By targeting CA layers and leveraging a projection-based suppression scheme, CASteer achieves state-of-the-art erasure across abstract and concrete concepts and across diverse backbones (SD-1.4, SDXL, SANA) and distilled variants, with extensions to multiple concepts and implicit prompts. The approach offers a practical, scalable tool for safety and content control in generative imaging, with clear pathways for integration and future theoretical grounding.

Abstract

Diffusion models have transformed image generation, yet controlling their outputs to reliably erase undesired concepts remains challenging. Existing approaches usually require task-specific training and struggle to generalize across both concrete (e.g., objects) and abstract (e.g., styles) concepts. We propose CASteer (Cross-Attention Steering), a training-free framework for concept erasure in diffusion models using steering vectors to influence hidden representations dynamically. CASteer precomputes concept-specific steering vectors by averaging neural activations from images generated for each target concept. During inference, it dynamically applies these vectors to suppress undesired concepts only when they appear, ensuring that unrelated regions remain unaffected. This selective activation enables precise, context-aware erasure without degrading overall image quality. This approach achieves effective removal of harmful or unwanted content across a wide range of visual concepts, all without model retraining. CASteer outperforms state-of-the-art concept erasure techniques while preserving unrelated content and minimizing unintended effects. Pseudocode is provided in the supplementary.

Paper Structure

This paper contains 44 sections, 15 equations, 47 figures, 16 tables, 2 algorithms.

Figures (47)

  • Figure 1: Main pipeline. (Bottom left, gray background) For computing a steering vector, we prompt diffusion model with two prompts that differ in a desired concept, e.g., "anime style" and save CA outputs at each timestamp $t$ and each CA layer $i$. We average these outputs over image patches and get averaged CA outputs $ca^{pos\_avg}_{it}$ and $ca^{neg\_avg}_{it}$ for each $t$ and $i$. We subtract the latter from the former, getting a steering vector for the layer $i$ and timestamp $t$$ca^{anime}_{it}$. (Right) For deleting concept $X$ from generation, at each denoising step $t$, we subtract steering vector $ca^{X}_{it}$ multiplied by intensity $\alpha$ from the CA outputs of the layer $i$.
  • Figure 2: SPM failure in removing implicitly defined concepts (SD-1.4). Top: CASteer, Bottom: SPM. Left: “a mouse from Disneyland,” Right: “a yellow Pokemon.” CASteer erases Mickey and Pikachu concepts despite not being explicitly named, while SPM fails.
  • Figure 3: Evaluation of nudity-erased models. Robustness is measured with nudity prompts from the I2P dataset, while locality is assessed using COCO-30K prompts.
  • Figure 3: Comparison of various methods on concrete concept erasure (removing "Snoopy")
  • Figure 4: Qualitative results on SDXL (left) and SANA (right) on removing "Snoopy". Top: original model generations, bottom: generations of model steered to remove "Snoopy"
  • ...and 42 more figures