Table of Contents
Fetching ...

Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning

Ping Chen, Xiang Liu, Xingpeng Zhang, Fei Shen, Xun Gong, Zhaoxiang Liu, Zezhou Chen, Huan Hu, Kai Wang, Shiguo Lian

Abstract

Diffusion models operate in a reflexive System 1 mode, constrained by a fixed, content-agnostic sampling schedule. This rigidity arises from the curse of state dimensionality, where the combinatorial explosion of possible states in the high-dimensional noise manifold renders explicit trajectory planning intractable and leads to systematic computational misallocation. To address this, we introduce Chain-of-Trajectories (CoTj), a train-free framework enabling System 2 deliberative planning. Central to CoTj is Diffusion DNA, a low-dimensional signature that quantifies per-stage denoising difficulty and serves as a proxy for the high-dimensional state space, allowing us to reformulate sampling as graph planning on a directed acyclic graph. Through a Predict-Plan-Execute paradigm, CoTj dynamically allocates computational effort to the most challenging generative phases. Experiments across multiple generative models demonstrate that CoTj discovers context-aware trajectories, improving output quality and stability while reducing redundant computation. This work establishes a new foundation for resource-aware, planning-based diffusion modeling. The code is available at https://github.com/UnicomAI/CoTj.

Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning

Abstract

Diffusion models operate in a reflexive System 1 mode, constrained by a fixed, content-agnostic sampling schedule. This rigidity arises from the curse of state dimensionality, where the combinatorial explosion of possible states in the high-dimensional noise manifold renders explicit trajectory planning intractable and leads to systematic computational misallocation. To address this, we introduce Chain-of-Trajectories (CoTj), a train-free framework enabling System 2 deliberative planning. Central to CoTj is Diffusion DNA, a low-dimensional signature that quantifies per-stage denoising difficulty and serves as a proxy for the high-dimensional state space, allowing us to reformulate sampling as graph planning on a directed acyclic graph. Through a Predict-Plan-Execute paradigm, CoTj dynamically allocates computational effort to the most challenging generative phases. Experiments across multiple generative models demonstrate that CoTj discovers context-aware trajectories, improving output quality and stability while reducing redundant computation. This work establishes a new foundation for resource-aware, planning-based diffusion modeling. The code is available at https://github.com/UnicomAI/CoTj.
Paper Structure (28 sections, 34 equations, 12 figures, 5 tables)

This paper contains 28 sections, 34 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: The Paradigm Shift from Fixed Scheduling to Chain-of-Trajectories (CoTj).Top: Comparison of inference mechanisms. Standard diffusion acts as a System 1 process, relying on a fixed time series that constrains potential. In contrast, CoTj introduces a System 2 approach via a CoTj-guided Optimal Planner, generating an optimized time series tailored to the input. Bottom: Real-world comparisons across modalities. In text-to-image generation (QwenImage Wu2025QwenImage, Z-image-Turbo Cai2025zimage), CoTj achieves superior visual results under limited computational resources. In text-to-video tasks (Wan2.2 wan2025wan), our CoTj significantly enhances motion dynamics and realism while maintaining high spatial quality.
  • Figure 2: Super-Node DAG for trajectory planning. Visualization of the dense reverse-time DAG $\mathcal{G}=(\mathcal{V},\mathcal{E})$, showing aggregated high-dimensional states as Super-Nodes and possible transitions as edges. The DAG encodes all feasible generative trajectories, allowing the CoTj planner to find globally optimal paths while avoiding regressive or non-convergent regions.
  • Figure 3: Entropic divergence patterns reveal intrinsic Diffusion DNA. Profiles of $\mathcal{C}(t)$ expose heterogeneous error-decay dynamics across inputs. Structurally constrained, low-entropy visual targets exhibit rapid stabilization, whereas abstract and visually uncertain compositions lead to prolonged refinement phases. The shaded region illustrates the additional computational burden induced by entropic complexity, highlighting that generative difficulty is governed primarily by intrinsic visual uncertainty rather than surface-level prompt properties (e.g., length or linguistic complexity).
  • Figure 4: Statistical landscape of Diffusion DNA. We analyze large-scale structural properties of difficulty profiles using 1,000,000 prompt pairs derived from the PickScore training set. (A) DNA cosine similarity distribution: pairwise similarities concentrate in a high-similarity regime, suggesting that diffusion difficulty profiles inhabit a structured region of the generative landscape. (B) Semantic–difficulty relationship: the correlation between semantic embedding similarity and DNA similarity is low ($r = 0.046$), revealing a statistical decoupling between surface-level prompt semantics and intrinsic generative difficulty.
  • Figure 5: Predicting Diffusion DNA from condition embeddings. Distribution of pairwise cosine similarities between predicted ($\hat{\mathcal{D}}$) and true DNA trajectories. The model effectively captures structural patterns of generative difficulty across low- and high-entropy scenarios, providing a reliable signal for resource-aware, System 2 trajectory planning.
  • ...and 7 more figures