Generalized Diffusion Model with Adjusted Offset Noise
Takuro Kutsuna
TL;DR
This work addresses the difficulty of diffusion models generating data with extreme brightness by introducing a probabilistically grounded generalization that diffuses inputs into Gaussian distributions with arbitrary means. The authors define forward and reverse processes that incorporate an auxiliary noise variable $\bm{\xi}$ with time-dependent scaling, derive an evidence lower bound-based loss that mirrors offset-noise losses but with principled coefficients $\phi_t$ and $\psi_t$, and show how this framework encompasses standard diffusion as a special case. The approach integrates with $v$-prediction and provides a detailed methodology for constructing $\gamma_t$, including a balanced strategy that aligns noise terms across time. Empirical results on a synthetic Cylinder dataset demonstrate improved handling of brightness extremes, especially in high dimensions, with favorable 1-Wasserstein and MMD metrics compared to baselines. The work thus offers a rigorous theoretical interpretation of offset noise, extends diffusion-model flexibility, and suggests practical paths for robust brightness control in diffusion-based generative modeling.
Abstract
Diffusion models have become fundamental tools for modeling data distributions in machine learning and have applications in image generation, drug discovery, and audio synthesis. Despite their success, these models face challenges when generating data with extreme brightness values, as evidenced by limitations in widely used frameworks like Stable Diffusion. Offset noise has been proposed as an empirical solution to this issue, yet its theoretical basis remains insufficiently explored. In this paper, we propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework. Our approach modifies both the forward and reverse diffusion processes, enabling inputs to be diffused into Gaussian distributions with arbitrary mean structures. We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments, while broadening its applicability. Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.
