Table of Contents
Fetching ...

Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation

Mingyu Kim, Young-Heon Kim, Mijung Park

Abstract

Safety mechanisms for diffusion and flow models have recently been developed along two distinct paths. In robot planning, control barrier functions are employed to guide generative trajectories away from obstacles at every denoising step by explicitly imposing geometric constraints. In parallel, recent data-driven, negative guidance approaches have been shown to suppress harmful content and promote diversity in generated samples. However, they rely on heuristics without clearly stating when safety guidance is actually necessary. In this paper, we first introduce a unified probabilistic framework using a Maximum Mean Discrepancy (MMD) potential for image generation tasks that recasts both Shielded Diffusion and Safe Denoiser as instances of our energy-based negative guidance against unsafe data samples. Furthermore, we leverage control-barrier functions analysis to justify the existence of a critical time window in which negative guidance must be strong; outside of this window, the guidance should decay to zero to ensure safe and high-quality generation. We evaluate our unified framework on several realistic safe generation scenarios, confirming that negative guidance should be applied in the early stages of the denoising process for successful safe generation.

Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation

Abstract

Safety mechanisms for diffusion and flow models have recently been developed along two distinct paths. In robot planning, control barrier functions are employed to guide generative trajectories away from obstacles at every denoising step by explicitly imposing geometric constraints. In parallel, recent data-driven, negative guidance approaches have been shown to suppress harmful content and promote diversity in generated samples. However, they rely on heuristics without clearly stating when safety guidance is actually necessary. In this paper, we first introduce a unified probabilistic framework using a Maximum Mean Discrepancy (MMD) potential for image generation tasks that recasts both Shielded Diffusion and Safe Denoiser as instances of our energy-based negative guidance against unsafe data samples. Furthermore, we leverage control-barrier functions analysis to justify the existence of a critical time window in which negative guidance must be strong; outside of this window, the guidance should decay to zero to ensure safe and high-quality generation. We evaluate our unified framework on several realistic safe generation scenarios, confirming that negative guidance should be applied in the early stages of the denoising process for successful safe generation.
Paper Structure (58 sections, 7 theorems, 36 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 58 sections, 7 theorems, 36 equations, 12 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

There exists a nonnegative weight $\beta^\ast({\bm{x}}_t)$—monotone in the posterior likelihood that ${\bm{x}}_t$ originates from the unsafe set—such that

Figures (12)

  • Figure 1: (a) By incorporating SAFREE yoon2024safree and SLD schramowski2023safe, our method avoids generating inappropriate images. (b) On artificially memorized SDv2.1 somepalli2023understanding, it mitigates memorization, with early-stopped negative guidance preserving quality, enhancing diversity, and revealing a critical time window. All images are sampled at the top 5% most similar to the Imagenette training set.
  • Figure 2: Motivation: 2D flow‑matching toy example. (a) A pretrained flow with “negative” data points highlighted in orange. (b) Learned velocity field $f_{\theta}(x)$ together with the negative‑guidance direction $\nabla_x E(x)$. This panel depicts samples at $t=0.8$ (c) Samples generated with full negative guidance; squared Wasserstein distance to the target distribution (excluding negative regions) $W^2=1.009$. (d) Samples generated with early‑stop negative guidance; squared Wasserstein distance $W^2=0.937$. Applying full negative guidance either leaves mass near the unsafe set or distorts nearby modes. In contrast, early stopping of the guidance reduces the probability of placing particles near the unsafe region and produces samples that better match the target distribution.
  • Figure 3: Memorization under ImageNette fine‑tuning.
  • Figure 4: Ablation on time windows of negative guidance
  • Figure D.1: Reference images for safe generation against nudity prompts
  • ...and 7 more figures

Theorems & Definitions (12)

  • Theorem 1: Theorem 3.2 in kim2025trainingfreesafedenoiserssafe. Safe vs. data/unsafe denoisers
  • Proposition 1: Safe Denoiser as MMD-gradient guidance
  • Proposition 2: Radius–bandwidth matching
  • Theorem 2: Forward-time critical window
  • Lemma 1: Integrating factor (forward)
  • Lemma 2: Comparison (forward)
  • proof
  • proof : Proof of the sufficient certificate
  • proof : Proof of the necessary certificate
  • proof
  • ...and 2 more