Table of Contents
Fetching ...

On gauge freedom, conservativity and intrinsic dimensionality estimation in diffusion models

Christian Horvat, Jean-Pascal Pfister

TL;DR

This work introduces gauge freedom in diffusion models by decomposing the learned score field into a conservative part and a gauge remainder that preserves exact sampling and density estimation. It proves that exactness does not require conservativity; rather, the remainder must satisfy a divergence-based gauge condition, and the conservative component must match the true score for exact density/sampling. The framework also shows conservativity is advantageous for extracting local data-manifold information, enabling intrinsic dimensionality estimation via the Jacobian dynamics and singular-value evolution. Practically, these insights clarify when to constrain diffusion models and suggest penalties to enforce gauge conditions, with potential impact on both generation fidelity and manifold analysis in high dimensions.

Abstract

Diffusion models are generative models that have recently demonstrated impressive performances in terms of sampling quality and density estimation in high dimensions. They rely on a forward continuous diffusion process and a backward continuous denoising process, which can be described by a time-dependent vector field and is used as a generative model. In the original formulation of the diffusion model, this vector field is assumed to be the score function (i.e. it is the gradient of the log-probability at a given time in the diffusion process). Curiously, on the practical side, most studies on diffusion models implement this vector field as a neural network function and do not constrain it be the gradient of some energy function (that is, most studies do not constrain the vector field to be conservative). Even though some studies investigated empirically whether such a constraint will lead to a performance gain, they lead to contradicting results and failed to provide analytical results. Here, we provide three analytical results regarding the extent of the modeling freedom of this vector field. {Firstly, we propose a novel decomposition of vector fields into a conservative component and an orthogonal component which satisfies a given (gauge) freedom. Secondly, from this orthogonal decomposition, we show that exact density estimation and exact sampling is achieved when the conservative component is exactly equals to the true score and therefore conservativity is neither necessary nor sufficient to obtain exact density estimation and exact sampling. Finally, we show that when it comes to inferring local information of the data manifold, constraining the vector field to be conservative is desirable.

On gauge freedom, conservativity and intrinsic dimensionality estimation in diffusion models

TL;DR

This work introduces gauge freedom in diffusion models by decomposing the learned score field into a conservative part and a gauge remainder that preserves exact sampling and density estimation. It proves that exactness does not require conservativity; rather, the remainder must satisfy a divergence-based gauge condition, and the conservative component must match the true score for exact density/sampling. The framework also shows conservativity is advantageous for extracting local data-manifold information, enabling intrinsic dimensionality estimation via the Jacobian dynamics and singular-value evolution. Practically, these insights clarify when to constrain diffusion models and suggest penalties to enforce gauge conditions, with potential impact on both generation fidelity and manifold analysis in high dimensions.

Abstract

Diffusion models are generative models that have recently demonstrated impressive performances in terms of sampling quality and density estimation in high dimensions. They rely on a forward continuous diffusion process and a backward continuous denoising process, which can be described by a time-dependent vector field and is used as a generative model. In the original formulation of the diffusion model, this vector field is assumed to be the score function (i.e. it is the gradient of the log-probability at a given time in the diffusion process). Curiously, on the practical side, most studies on diffusion models implement this vector field as a neural network function and do not constrain it be the gradient of some energy function (that is, most studies do not constrain the vector field to be conservative). Even though some studies investigated empirically whether such a constraint will lead to a performance gain, they lead to contradicting results and failed to provide analytical results. Here, we provide three analytical results regarding the extent of the modeling freedom of this vector field. {Firstly, we propose a novel decomposition of vector fields into a conservative component and an orthogonal component which satisfies a given (gauge) freedom. Secondly, from this orthogonal decomposition, we show that exact density estimation and exact sampling is achieved when the conservative component is exactly equals to the true score and therefore conservativity is neither necessary nor sufficient to obtain exact density estimation and exact sampling. Finally, we show that when it comes to inferring local information of the data manifold, constraining the vector field to be conservative is desirable.
Paper Structure (11 sections, 5 theorems, 33 equations, 5 figures)

This paper contains 11 sections, 5 theorems, 33 equations, 5 figures.

Key Result

Theorem 1

Let $t\in [0,1]$. For any vector field $v\in L^2(p)$, there exists a unique conservative vector field $\nabla \phi \in L^2(p)$, and a unique vector field $r \in L^2(p)$ fulfilling the gauge freedom condition (eq:condition_r) such that

Figures (5)

  • Figure 1: Every vector field $v\in L^2(p)$ can be orthogonally decomposed into a conservative vector field $\nabla \phi$ and a remainder term $r$ that satisfies the gauge freedom condition given by equation (\ref{['eq:condition_r']}). (A) Exact sampling and density estimation is obtained when the conservative component $\nabla \phi$ of the vector field $v$ is equal to the true score (i.e. $\nabla \phi = \nabla\log p$) - which is the case for all the points on the green dashed line. So $v$ does not need to be conservative. (B) Even if $v$ is conservative, it is not sufficient to guarantee exact sampling and density estimation since it may be different than the true score.
  • Figure 2: Intuition of how the singular values of $Y(\mathbf{x}_1,t)=\frac{\partial \phi_{t}(\mathbf{x}_1 ) }{ \partial \mathbf{x}_1 }$ evolve over time for a low-dimensional data-manifold. The singular value in the manifold direction will saturate, while the singular values in the off-manifold direction will tend to $0$ (bottom left).
  • Figure 3: A: Singular values of $Y_{t}$ as predicted by lemma \ref{['lemma']} in the appendix for $s_{\theta}$ conservative (left) and non-conservative (right). Each color represents one singular value (5 in total as the embedding dimension is $5$). B: Intrinsic dimensionality estimation of sphere with dimension $d=D/2-1$ embedded in $D$ for different values of $d$.
  • Figure 4: Singular values trajectories of as torus for different embedding dimensions ($D=3$ and $D=5$). We show the evolution of both a conservative and not conservative diffusion model.
  • Figure 5: Singular values trajectories of the Swiss Roll and sphere for different embedding dimensions ($D=3$ and $D=5$). We show the evolution of both a conservative and not conservative diffusion model.

Theorems & Definitions (7)

  • Theorem 1: Orthogonal decomposition
  • Corollary 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Lemma 1
  • Lemma 2