Edge-preserving noise for diffusion models
Jente Vandersanden, Sascha Holl, Xingchang Huang, Gurprit Singh
TL;DR
This work introduces edge-preserving diffusion, a content-aware generalization of isotropic diffusion, by employing a forward hybrid process that initially preserves edges and gradually transitions to isotropic noise. The forward process uses a transition function $τ(t)$ with a transition point $t_Φ$ and a time-varying edge sensitivity $λ(t)$ to modulate noise based on image structure, while training optimizes a network to predict non-isotropic noise with a loss $L = || f_θ(x_t,t) - σ_t ε_t ||^2$. Backward posteriors and training are adapted to this non-isotropic setting, using tensor variances and a corresponding analytic update, enabling faster convergence and better learning of low-to-mid frequency content. Empirically, the method yields up to 30% improvements in FID and CLIP scores across unconditional and shape-guided tasks, including stroke-based generation, with minimal computational overhead. Overall, edge-preserving diffusion improves structural fidelity and robustness, offering a practical enhancement to diffusion-based generation with strong potential for downstream editing and shape-guided synthesis.
Abstract
Classical generative diffusion models learn an isotropic Gaussian denoising process, treating all spatial regions uniformly, thus neglecting potentially valuable structural information in the data. Inspired by the long-established work on anisotropic diffusion in image processing, we present a novel edge-preserving diffusion model that generalizes over existing isotropic models by considering a hybrid noise scheme. In particular, we introduce an edge-aware noise scheduler that varies between edge-preserving and isotropic Gaussian noise. We show that our model's generative process converges faster to results that more closely match the target distribution. We demonstrate its capability to better learn the low-to-mid frequencies within the dataset, which plays a crucial role in representing shapes and structural information. Our edge-preserving diffusion process consistently outperforms state-of-the-art baselines in unconditional image generation. It is also particularly more robust for generative tasks guided by a shape-based prior, such as stroke-to-image generation. We present qualitative and quantitative results (FID and CLIP score) showing consistent improvements of up to 30% for both tasks.
