Table of Contents
Fetching ...

Generalized Diffusion Model with Adjusted Offset Noise

Takuro Kutsuna

TL;DR

This work addresses the difficulty of diffusion models generating data with extreme brightness by introducing a probabilistically grounded generalization that diffuses inputs into Gaussian distributions with arbitrary means. The authors define forward and reverse processes that incorporate an auxiliary noise variable $\bm{\xi}$ with time-dependent scaling, derive an evidence lower bound-based loss that mirrors offset-noise losses but with principled coefficients $\phi_t$ and $\psi_t$, and show how this framework encompasses standard diffusion as a special case. The approach integrates with $v$-prediction and provides a detailed methodology for constructing $\gamma_t$, including a balanced strategy that aligns noise terms across time. Empirical results on a synthetic Cylinder dataset demonstrate improved handling of brightness extremes, especially in high dimensions, with favorable 1-Wasserstein and MMD metrics compared to baselines. The work thus offers a rigorous theoretical interpretation of offset noise, extends diffusion-model flexibility, and suggests practical paths for robust brightness control in diffusion-based generative modeling.

Abstract

Diffusion models have become fundamental tools for modeling data distributions in machine learning and have applications in image generation, drug discovery, and audio synthesis. Despite their success, these models face challenges when generating data with extreme brightness values, as evidenced by limitations in widely used frameworks like Stable Diffusion. Offset noise has been proposed as an empirical solution to this issue, yet its theoretical basis remains insufficiently explored. In this paper, we propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework. Our approach modifies both the forward and reverse diffusion processes, enabling inputs to be diffused into Gaussian distributions with arbitrary mean structures. We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments, while broadening its applicability. Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.

Generalized Diffusion Model with Adjusted Offset Noise

TL;DR

This work addresses the difficulty of diffusion models generating data with extreme brightness by introducing a probabilistically grounded generalization that diffuses inputs into Gaussian distributions with arbitrary means. The authors define forward and reverse processes that incorporate an auxiliary noise variable with time-dependent scaling, derive an evidence lower bound-based loss that mirrors offset-noise losses but with principled coefficients and , and show how this framework encompasses standard diffusion as a special case. The approach integrates with -prediction and provides a detailed methodology for constructing , including a balanced strategy that aligns noise terms across time. Empirical results on a synthetic Cylinder dataset demonstrate improved handling of brightness extremes, especially in high dimensions, with favorable 1-Wasserstein and MMD metrics compared to baselines. The work thus offers a rigorous theoretical interpretation of offset noise, extends diffusion-model flexibility, and suggests practical paths for robust brightness control in diffusion-based generative modeling.

Abstract

Diffusion models have become fundamental tools for modeling data distributions in machine learning and have applications in image generation, drug discovery, and audio synthesis. Despite their success, these models face challenges when generating data with extreme brightness values, as evidenced by limitations in widely used frameworks like Stable Diffusion. Offset noise has been proposed as an empirical solution to this issue, yet its theoretical basis remains insufficiently explored. In this paper, we propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework. Our approach modifies both the forward and reverse diffusion processes, enabling inputs to be diffused into Gaussian distributions with arbitrary mean structures. We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments, while broadening its applicability. Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.

Paper Structure

This paper contains 59 sections, 5 theorems, 48 equations, 9 figures, 1 algorithm.

Key Result

Theorem 3.1

Suppose the forward process is defined as in eq:fp_proposed1eq:fp_proposed2 and the reverse process is as in eq:rev_prop1eq:rev_prop2eq:rev_prop3eq:rev_prop4. Accordingly, the loss function that maximizes the evidence lower bound of $\log p_\theta(\bm{x}_0)$ is given by where $\lambda_t$ is given by eq:def_lambda, and $\phi_t$ and $\psi_t$ are given by

Figures (9)

  • Figure 1: From left to right, the figure illustrates $\beta_t$ used in Stable Diffusion 1.5, along with the corresponding $\gamma_t$, $\phi_t$, and $\psi_t$ computed using the balanced-$\phi_t, \psi_t$ strategy (the horizontal axis represents time $t$).
  • Figure 2: Distribution of generated data with $n=2$ at each time step during the reverse process. The rightmost column represents the test data. The top row shows the results of the Base model, while the bottom row illustrates those of the Proposed model ($\sigma_c^2=1.0$).
  • Figure 3: Evaluation results of 1WD (top row) and MMD (bottom row) during training.
  • Figure 4: Comparison of distributions of average brightnesses $L_\text{avg}(\bm{x}_0)$ between the test data and the generated data.
  • Figure 5: Evaluation results of 1WD (top) and MMD (bottom) during training within the $v$-prediction framework.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Theorem 3.1: Training loss function
  • Proposition 3.2
  • Lemma 3.3
  • proof
  • Proposition 4.1
  • proof
  • Proposition 5.1: Training loss function for $v$-prediction
  • proof