Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering

Wei Wei; Yizhou Zeng; Kuntian Chen; Sophie Langer; Mariia Seleznova; Hung-Hsu Chou

Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering

Wei Wei, Yizhou Zeng, Kuntian Chen, Sophie Langer, Mariia Seleznova, Hung-Hsu Chou

TL;DR

This work investigates whether the high-dimensional latent space of diffusion models encodes class-relevant structure that can be revealed through seed-level analysis. By pairing a pretrained classifier with confidence-based filtering and a DDIM generator, the authors show that high-confidence seeds yield strong cross-level predictability and clear latent-space separability, while low-confidence seeds wash out these effects. They introduce a latent-structure visualization pipeline (LDA–UMAP) and quantify structure emergence via an LDA discriminability score, demonstrating that latent structure is latent-space driven and amplified by confidence filtering. A practical upshot is a post-hoc conditional-generation approach that selects seeds by confidence and target class without modifying or retraining the diffusion model. The findings offer a new lens on latent-space analysis and a scalable conditioning mechanism for deterministic diffusion processes, with implications for efficiency and interpretability across generative models.

Abstract

Diffusion models rely on a high-dimensional latent space of initial noise seeds, yet it remains unclear whether this space contains sufficient structure to predict properties of the generated samples, such as their classes. In this work, we investigate the emergence of latent structure through the lens of confidence scores assigned by a pre-trained classifier to generated samples. We show that while the latent space appears largely unstructured when considering all noise realizations, restricting attention to initial noise seeds that produce high-confidence samples reveals pronounced class separability. By comparing class predictability across noise subsets of varying confidence and examining the class separability of the latent space, we find evidence of class-relevant latent structure that becomes observable only under confidence-based filtering. As a practical implication, we discuss how confidence-based filtering enables conditional generation as an alternative to guidance-based methods.

Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering

TL;DR

Abstract

Paper Structure (31 sections, 2 theorems, 14 equations, 11 figures, 1 table)

This paper contains 31 sections, 2 theorems, 14 equations, 11 figures, 1 table.

Introduction
Related Work
Background and Formulation
Diffusion Model Preliminaries
Latent Classification
Confidence-Based Filtering
Methodology
Experiments and Results
Models and Benchmark Datasets
Predictability
Cross-Level Label Predictability.
Accuracy vs. Predicted Confidence (Logit Prediction).
Latent Structure Emergence
Latent Structure via LDA–UMAP.
Quantifying Structure Emergence.
...and 16 more sections

Key Result

Lemma 3.1

Suppose is continuous on $[0,{T}] \times \mathbb{R}^d$ and locally Lipschitz in ${x}$, uniformly in ${t}$. Then diffusion-ODE admits a unique regular flow map $\phi_{t}({x}): [0,{T}] \times \mathbb{R}^{d} \rightarrow \mathbb{R}^{d}$, i.e., Then the corresponding probability flow $p_t(x)$ satisfies

Figures (11)

Figure 1: Correspondence of data classes and latent classes
Figure 2: Cross-level classification accuracy heatmaps for diffusion noise across confidence levels through neural networks. (a) DDIM: A pronounced high-accuracy region appears in the high-confidence regime, indicating transferable predictability across levels (see \ref{['app:heatmap']} for per-cell accuracies). (b) DDPM: Accuracy is heterogeneous across levels with no recognizable structure (see \ref{['app:ddpm']} for per-cell accuracies).
Figure 3: Structure visualization via projection, with points colored by class label. Left: across confidence levels. Right: across methods.
Figure 4: Prediction accuracy as a function of predicted confidence for the logit-prediction approach. Each point aggregates seeds within a confidence range; bubble size indicates the number of seeds in that bin.
Figure 5: Overlay of low-confidence (Level 10) noises onto the LDA--UMAP embedding learned from high-confidence (Level 1) noises. While Level 1 samples form coherent class-aligned clusters, Level 10 noises do not form distinct groups and instead populate interstitial regions between these clusters.
...and 6 more figures

Theorems & Definitions (6)

Lemma 3.1
Definition 3.2: Data and latent class
Theorem 3.3
Definition 3.4: Label and confidence of seed
proof
proof

Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering

TL;DR

Abstract

Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (6)