Table of Contents
Fetching ...

When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis

Xiang Li, Zebang Shen, Ya-Ping Hsieh, Niao He

Abstract

Score-based methods, such as diffusion models and Bayesian inverse problems, are often interpreted as learning the data distribution in the low-noise limit ($σ\to 0$). In this work, we propose an alternative perspective: their success arises from implicitly learning the data manifold rather than the full distribution. Our claim is based on a novel analysis of scores in the small-$σ$ regime that reveals a sharp separation of scales: information about the data manifold is $Θ(σ^{-2})$ stronger than information about the distribution. We argue that this insight suggests a paradigm shift from the less practical goal of distributional learning to the more attainable task of geometric learning, which provably tolerates $O(σ^{-2})$ larger errors in score approximation. We illustrate this perspective through three consequences: i) in diffusion models, concentration on data support can be achieved with a score error of $o(σ^{-2})$, whereas recovering the specific data distribution requires a much stricter $o(1)$ error; ii) more surprisingly, learning the uniform distribution on the manifold-an especially structured and useful object-is also $O(σ^{-2})$ easier; and iii) in Bayesian inverse problems, the maximum entropy prior is $O(σ^{-2})$ more robust to score errors than generic priors. Finally, we validate our theoretical findings with preliminary experiments on large-scale models, including Stable Diffusion.

When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis

Abstract

Score-based methods, such as diffusion models and Bayesian inverse problems, are often interpreted as learning the data distribution in the low-noise limit (). In this work, we propose an alternative perspective: their success arises from implicitly learning the data manifold rather than the full distribution. Our claim is based on a novel analysis of scores in the small- regime that reveals a sharp separation of scales: information about the data manifold is stronger than information about the distribution. We argue that this insight suggests a paradigm shift from the less practical goal of distributional learning to the more attainable task of geometric learning, which provably tolerates larger errors in score approximation. We illustrate this perspective through three consequences: i) in diffusion models, concentration on data support can be achieved with a score error of , whereas recovering the specific data distribution requires a much stricter error; ii) more surprisingly, learning the uniform distribution on the manifold-an especially structured and useful object-is also easier; and iii) in Bayesian inverse problems, the maximum entropy prior is more robust to score errors than generic priors. Finally, we validate our theoretical findings with preliminary experiments on large-scale models, including Stable Diffusion.

Paper Structure

This paper contains 49 sections, 16 theorems, 130 equations, 4 figures, 4 tables.

Key Result

Theorem 3.1

Assume assumption:manifoldassumption:pdata holds. For any $x \in T_{\mathcal{M}}(\epsilon)$, where $H(x)$ contains the curvature information of the manifold and $\epsilon$ is some sufficiently small constant; both of them are independent of $\sigma$. The small $o(1)$ term is uniform for $x \in T_{\mathcal{M}}(\epsilon)$.

Figures (4)

  • Figure 1: Toy examples illustrating recovered distributions under different regimes, with the manifold represented as a one-dimensional circle embedded in $\mathbb{R}^2$.
  • Figure 2: Comparison of stationary sample distributions generated with standard Langevin dynamics (L) versus our Tempered Score Langevin dynamics \ref{['eq:modified_langevin']} with $\alpha = 1$ (TS-1). The circle and ellipse correspond to manifolds with $(a,b) = (1,1)$ and $(a,b) = (1,2)$, respectively.
  • Figure 3: Top row: PC. Bottom row: TS (ours). Samples in the same column are generated using the same prompt, the same number of corrector steps, and the same random seed. As shown, TS produces samples that appear more authentic and contain richer details.
  • Figure 4: Comparison of distributions generated with VE diffusion model versus our TS Langevin dynamics \ref{['eq:modified_langevin']} with $\alpha = 1$.

Theorems & Definitions (33)

  • Theorem 3.1: Informal \ref{['lemma:limit_V']}
  • Remark 4.1
  • Theorem 4.1
  • Theorem 5.1
  • Theorem 5.2
  • Remark 5.1
  • Theorem 6.1
  • Remark B.1: Compactness of the manifold implies boundedness of gradients.
  • Corollary B.1: Theorem 2 of lapinski2019multivariate
  • proof
  • ...and 23 more