Table of Contents
Fetching ...

SFBD Flow: A Continuous-Optimization Framework for Training Diffusion Models with Noisy Samples

Haoye Lu, Darren Lo, Yaoliang Yu

Abstract

Diffusion models achieve strong generative performance but often rely on large datasets that may include sensitive content. This challenge is compounded by the models' tendency to memorize training data, raising privacy concerns. SFBD (Lu et al., 2025) addresses this by training on corrupted data and using limited clean samples to capture local structure and improve convergence. However, its iterative denoising and fine-tuning loop requires manual coordination, making it burdensome to implement. We reinterpret SFBD as an alternating projection algorithm and introduce a continuous variant, SFBD flow, that removes the need for alternating steps. We further show its connection to consistency constraint-based methods, and demonstrate that its practical instantiation, Online SFBD, consistently outperforms strong baselines across benchmarks.

SFBD Flow: A Continuous-Optimization Framework for Training Diffusion Models with Noisy Samples

Abstract

Diffusion models achieve strong generative performance but often rely on large datasets that may include sensitive content. This challenge is compounded by the models' tendency to memorize training data, raising privacy concerns. SFBD (Lu et al., 2025) addresses this by training on corrupted data and using limited clean samples to capture local structure and improve convergence. However, its iterative denoising and fine-tuning loop requires manual coordination, making it burdensome to implement. We reinterpret SFBD as an alternating projection algorithm and introduce a continuous variant, SFBD flow, that removes the need for alternating steps. We further show its connection to consistency constraint-based methods, and demonstrate that its practical instantiation, Online SFBD, consistently outperforms strong baselines across benchmarks.

Paper Structure

This paper contains 27 sections, 10 theorems, 102 equations, 14 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

For $k \geq 0$, $D_{\mathrm{KL}}({p_{\rm data}}\space \|\space {p_0^{k\space+\space1, \gamma}}) \space - \space D_{\mathrm{KL}}({p_{\rm data}}\space \|\space {p_0^{k, \gamma}}) \space \leq \space -\gamma D_{\mathrm{KL}}({p_\tau^*}\space \|\space {p_\tau^{k, \gamma}})$. In addition, for $K \geq 1,\, \mathbf{u} \in \mathbb{R}^d$, and $M = D_{\mathrm{KL}}({p_{\rm data}}\space \|\space {p_{\mat

Figures (14)

  • Figure 1: SFBD uses alternative projection to guide the stochastical process sequences $P^k$ and $M^k$ converge to the optimal $P^*$. When $\gamma \rightarrow 0$, the changes of $P^k$ and $M^k$ become smooth and we obtain $\gamma$-SFBD.
  • Figure 2: FID scores of Online SFBD (OSFBD) on CIFAR-10 under different settings. Unless specified, the clean ratio is $0.04$, noise level $\sigma=0.59$, and gradient steps $m=20$. (c) reports results with OSFBD-VAN pretraining.
  • Figure 3: 50 clean samples, noise level $\sigma = 0.2$
  • Figure 4: 5,000 clean samples (10%), noise level $\sigma = 0.2$.
  • Figure 5: 2,000 clean samples (4%), noise level $\sigma = 0.59$.
  • ...and 9 more figures

Theorems & Definitions (10)

  • Proposition 1
  • Proposition 2
  • Corollary 1
  • Proposition 3
  • Proposition 3
  • Proposition 3
  • Corollary 1
  • Proposition 3
  • Proposition 4: LuWY2025, Prop 1
  • Lemma 1: PavonA1991, VargasTLL2021