Fast and Scalable Analytical Diffusion

Xinyi Shang; Peng Sun; Jingyu Lin; Zhiqiang Shen

Fast and Scalable Analytical Diffusion

Xinyi Shang, Peng Sun, Jingyu Lin, Zhiqiang Shen

TL;DR

This work challenges the prevailing assumption that the entire training data is necessary, uncovering the phenomenon of Posterior Progressive Concentration and proposes Dynamic Time-Aware Golden Subset Diffusion (GoldDiff), a training-free framework that decouples inference complexity from dataset size.

Abstract

Analytical diffusion models offer a mathematically transparent path to generative modeling by formulating the denoising score as an empirical-Bayes posterior mean. However, this interpretability comes at a prohibitive cost: the standard formulation necessitates a full-dataset scan at every timestep, scaling linearly with dataset size. In this work, we present the first systematic study addressing this scalability bottleneck. We challenge the prevailing assumption that the entire training data is necessary, uncovering the phenomenon of Posterior Progressive Concentration: the effective golden support of the denoising score is not static but shrinks asymptotically from the global manifold to a local neighborhood as the signal-to-noise ratio increases. Capitalizing on this, we propose Dynamic Time-Aware Golden Subset Diffusion (GoldDiff), a training-free framework that decouples inference complexity from dataset size. Instead of static retrieval, GoldDiff uses a coarse-to-fine mechanism to dynamically pinpoint the ''Golden Subset'' for inference. Theoretically, we derive rigorous bounds guaranteeing that our sparse approximation converges to the exact score. Empirically, GoldDiff achieves a $\bf 71 \times$ speedup on AFHQ while matching or achieving even better performance than full-scan baselines. Most notably, we demonstrate the first successful scaling of analytical diffusion to ImageNet-1K, unlocking a scalable, training-free paradigm for large-scale generative modeling.

Fast and Scalable Analytical Diffusion

TL;DR

Abstract

speedup on AFHQ while matching or achieving even better performance than full-scan baselines. Most notably, we demonstrate the first successful scaling of analytical diffusion to ImageNet-1K, unlocking a scalable, training-free paradigm for large-scale generative modeling.

Paper Structure (22 sections, 2 theorems, 23 equations, 6 figures, 7 tables)

This paper contains 22 sections, 2 theorems, 23 equations, 6 figures, 7 tables.

Introduction
Related Work
Denoising Diffusion Models.
Time-Aware Golden Subset Diffusion
Revisiting Analytical Denoisers
Unbiased Weight Estimation
Why Dynamic Retrieval? A Sensitivity Analysis
Theoretical-grounded Dynamic Selection
Truncation Error Bound and Complexity Analysis
Experiments
Experimental Setup
Efficacy and Efficiency Comparison
Ablation Study
Conclusion
Theoretical Analysis and Proofs
...and 7 more sections

Key Result

Theorem 1

Assume logits $\ell_i$ are sorted such that $\ell_{(1)} \ge \dots \ge \ell_{(N)}$. If the estimator aggregates only the top-$k$ samples, the error is bounded by: where $R = \max_{\mathbf{x} \in \mathcal{D}} \|\mathbf{x}\|_2$ is the data radius, and $\Delta_k \triangleq \ell_{(1)} - \ell_{(k+1)}$ represents the Logit Gap.

Figures (6)

Figure 1: The Phenomenon of Posterior Progressive Concentration.$\bigstar$ denotes the initial noise, and the target distribution is the Moons data pedregosa2011scikit. As the diffusion process reverses from pure noise to data (Left to Right), the golden support of the posterior distribution dynamically shrinks from the global manifold to a localized neighborhood.
Figure 2: Impact of Biased Weight Estimation. Due to biased weight estimation, PCA Lukoianov2025Locality produces inherently over-smoothed outputs even with sufficient denoising steps.
Figure 3: Analysis of the SOTA method PCA Lukoianov2025Locality for Subset Selection. (a) Evolution of the weight distribution during the denoising process. (b)--(c) Sensitivity analysis: (b) Performance evaluation of the analytical denoiser across varying random subset sizes ($N_{\text{sub}} \in \{10, 100, 1000, 5000\}$) compared to the full CIFAR-10 dataset. (c) Visualization of intermediate generation outputs.
Figure 4: Qualitative Comparison. We compare our GoldDiff (5th row) against four baselines: Optimal (1st row), Wiener filter (2nd row), Kamb kamb2024analytic (3rd row), and the PCA model (4th row). All images are generated from the same initial noise using 10 steps of DDIM song2020denoising. The last row displays reference samples generated by a trained U-Net ho2020denoising.
Figure 5: Qualitative Comparison of conditional Denoising on ImageNet-1K. We visualize samples generated by our method compared to two PCA-based baselines: Original PCA Lukoianov2025Locality and Unbiased PCA (PCA-U) for the class "Tench".
...and 1 more figures

Theorems & Definitions (2)

Theorem 1: Posterior Truncation Error Bound
Corollary 1: Sample-wise Error Bound for Local Estimators

Fast and Scalable Analytical Diffusion

TL;DR

Abstract

Fast and Scalable Analytical Diffusion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)