Table of Contents
Fetching ...

Towards a mathematical theory for consistency training in diffusion models

Gen Li, Zhihan Huang, Yuting Wei

TL;DR

The paper addresses the theoretical gaps in consistency training for diffusion models by establishing a non-asymptotic guarantee that a sequence of learned consistency mappings $\{f_t\}$ can yield single-step sampling that closely matches the data distribution. By framing the training as iterative consistency learning and imposing Lipschitz and estimation-error assumptions, the authors derive a Wasserstein bound $W_1(f_T(X_T), X_1) \le C_1 \frac{L_f^3 d^{5/2} \log^{5} T}{T} + \varepsilon + \varepsilon_{\mathcal{F}}$, and show that $T = \tilde{O}ig( \frac{L_f^3 d^{5/2}}{\varepsilon+\varepsilon_{\mathcal{F}}} \big)$ steps suffice to achieve $W_1 \le 2(\varepsilon+\varepsilon_{\mathcal{F}})$. The framework decouples training and sampling, enabling efficient one-shot sampling while providing a quantitative benchmark for fidelity that depends explicitly on the data dimension $d$ and Lipschitz constant $L_f$. The results offer a principled justification for consistency models and guide practical design of training schedules and model capacity for reliable fast sampling.

Abstract

Consistency models, which were proposed to mitigate the high computational overhead during the sampling phase of diffusion models, facilitate single-step sampling while attaining state-of-the-art empirical performance. When integrated into the training phase, consistency models attempt to train a sequence of consistency functions capable of mapping any point at any time step of the diffusion process to its starting point. Despite the empirical success, a comprehensive theoretical understanding of consistency training remains elusive. This paper takes a first step towards establishing theoretical underpinnings for consistency models. We demonstrate that, in order to generate samples within $\varepsilon$ proximity to the target in distribution (measured by some Wasserstein metric), it suffices for the number of steps in consistency learning to exceed the order of $d^{5/2}/\varepsilon$, with $d$ the data dimension. Our theory offers rigorous insights into the validity and efficacy of consistency models, illuminating their utility in downstream inference tasks.

Towards a mathematical theory for consistency training in diffusion models

TL;DR

The paper addresses the theoretical gaps in consistency training for diffusion models by establishing a non-asymptotic guarantee that a sequence of learned consistency mappings can yield single-step sampling that closely matches the data distribution. By framing the training as iterative consistency learning and imposing Lipschitz and estimation-error assumptions, the authors derive a Wasserstein bound , and show that steps suffice to achieve . The framework decouples training and sampling, enabling efficient one-shot sampling while providing a quantitative benchmark for fidelity that depends explicitly on the data dimension and Lipschitz constant . The results offer a principled justification for consistency models and guide practical design of training schedules and model capacity for reliable fast sampling.

Abstract

Consistency models, which were proposed to mitigate the high computational overhead during the sampling phase of diffusion models, facilitate single-step sampling while attaining state-of-the-art empirical performance. When integrated into the training phase, consistency models attempt to train a sequence of consistency functions capable of mapping any point at any time step of the diffusion process to its starting point. Despite the empirical success, a comprehensive theoretical understanding of consistency training remains elusive. This paper takes a first step towards establishing theoretical underpinnings for consistency models. We demonstrate that, in order to generate samples within proximity to the target in distribution (measured by some Wasserstein metric), it suffices for the number of steps in consistency learning to exceed the order of , with the data dimension. Our theory offers rigorous insights into the validity and efficacy of consistency models, illuminating their utility in downstream inference tasks.
Paper Structure (45 sections, 8 theorems, 130 equations)

This paper contains 45 sections, 8 theorems, 130 equations.

Key Result

Theorem 1

Suppose the learning rates are selected according to eqn:alpha-t and the target distribution satisfies property eqn:boundness. Under Assumptions ass:Lipschitz and ass:estimation, it obeys for some universal constant $C_{1} > 0$, where $X_T\sim \mathcal{N}(0,I_d)$.

Theorems & Definitions (8)

  • Theorem 1
  • Lemma 1: li2023towards, Lemma 1
  • Lemma 2: li2023towards, Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 6
  • Lemma 7
  • Lemma 8