Table of Contents
Fetching ...

PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models

Junhyuk So, Jiwoong Shin, Chaeyeon Jang, Eunhyeok Park

TL;DR

The paper tackles slow diffusion-model sampling by introducing the Picard Consistency Model (PCM), which trains a diffusion predictor to output the fixed-point solution $X^*$ from intermediate Picard trajectory states, enabling accelerated parallel sampling. A key innovation is model switching, which blends PCM with the base model to preserve exact convergence while achieving speedups; EMA stabilization and LoRA-based weight-space switching further enhance training stability and efficiency. Empirical results across image generation and robotic control show PCM delivers up to about 2.71x speedups relative to sequential denoising and about 1.77x relative to standard Picard iteration, without sacrificing output fidelity. The work positions PCM as a distillation-free, convergence-preserving acceleration method that leverages consistent fixed-point prediction and flexible switching strategies for practical, high-throughput diffusion inference.

Abstract

Recently, diffusion models have achieved significant advances in vision, text, and robotics. However, they still face slow generation speeds due to sequential denoising processes. To address this, a parallel sampling method based on Picard iteration was introduced, effectively reducing sequential steps while ensuring exact convergence to the original output. Nonetheless, Picard iteration does not guarantee faster convergence, which can still result in slow generation in practice. In this work, we propose a new parallelization scheme, the Picard Consistency Model (PCM), which significantly reduces the number of generation steps in Picard iteration. Inspired by the consistency model, PCM is directly trained to predict the fixed-point solution, or the final output, at any stage of the convergence trajectory. Additionally, we introduce a new concept called model switching, which addresses PCM's limitations and ensures exact convergence. Extensive experiments demonstrate that PCM achieves up to a 2.71x speedup over sequential sampling and a 1.77x speedup over Picard iteration across various tasks, including image generation and robotic control.

PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models

TL;DR

The paper tackles slow diffusion-model sampling by introducing the Picard Consistency Model (PCM), which trains a diffusion predictor to output the fixed-point solution from intermediate Picard trajectory states, enabling accelerated parallel sampling. A key innovation is model switching, which blends PCM with the base model to preserve exact convergence while achieving speedups; EMA stabilization and LoRA-based weight-space switching further enhance training stability and efficiency. Empirical results across image generation and robotic control show PCM delivers up to about 2.71x speedups relative to sequential denoising and about 1.77x relative to standard Picard iteration, without sacrificing output fidelity. The work positions PCM as a distillation-free, convergence-preserving acceleration method that leverages consistent fixed-point prediction and flexible switching strategies for practical, high-throughput diffusion inference.

Abstract

Recently, diffusion models have achieved significant advances in vision, text, and robotics. However, they still face slow generation speeds due to sequential denoising processes. To address this, a parallel sampling method based on Picard iteration was introduced, effectively reducing sequential steps while ensuring exact convergence to the original output. Nonetheless, Picard iteration does not guarantee faster convergence, which can still result in slow generation in practice. In this work, we propose a new parallelization scheme, the Picard Consistency Model (PCM), which significantly reduces the number of generation steps in Picard iteration. Inspired by the consistency model, PCM is directly trained to predict the fixed-point solution, or the final output, at any stage of the convergence trajectory. Additionally, we introduce a new concept called model switching, which addresses PCM's limitations and ensures exact convergence. Extensive experiments demonstrate that PCM achieves up to a 2.71x speedup over sequential sampling and a 1.77x speedup over Picard iteration across various tasks, including image generation and robotic control.

Paper Structure

This paper contains 29 sections, 1 theorem, 23 equations, 16 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

Let $F$ be continuous in $t$ and Lipschitz continuous in $x$. Then $\Phi$ has a unique fixed-point solution and is a contraction mapping, satisfying $d(\Phi(X),\Phi(Y)) \le q \cdot\, d(X,Y)$, where $(X, d)$ is a complete metric space and $q \in [0,1)$.

Figures (16)

  • Figure 1: Visual comparison of (a) Consistency Model song2023consistency, (b) Picard iteration, and (c) our Picard Consistency Model (PCM). (a) song2023consistency is trained to directly predict final output $x_T$ from any point $x_t\in\mathbb{R}^n$ in denoising trajectory. (b) Picard iteration forms a denoising trajectory along tensors $X\in\mathbb{R}^{T*n}$. (c) Inspired by song2023consistency, we train PCM to predict final point $X^*$ from any intermediate step in the Picard trajectory.
  • Figure 2: Overview of our method. Our goal is to accelerate the Picard iteration process, $X^{k+1} \leftarrow \Phi(X^k)$, by training with a loss function that minimizes the distance between a random point on the Picard trajectory and the fixed-point solution $X^*$. During inference, to preserve the convergence properties of the Picard iteration, we smoothly transition from our trained model $\theta_{PCM}$ to the original model $\theta_{base}$ using a scheduling function $\lambda(k)$ in feature space or weight space through LoRA.
  • Figure 3: Comparison of convergence errors between Picard iteration at $k$ and the ground truth. (a) PCM w/o model switching converges faster than the naive Picard, but the error begins to increase at the transition point and converges to a different point due to the modified weights. (b) PCM with model switching safely converges to the exact same output while ensuring accelerated convergence speed. Experiments are conducted on CelebA using DDIM.
  • Figure 4: Training dynamics of PCM. (a) Naive PCT shows lots of training instability where transition point became different and final convergence error also different at every training iteration. (b) By using EMA, the error curve smoothly moved across the training iteration. Experiments are conducted on CelebA using DDIM.
  • Figure 5: Qualitative comparison of Sequential (a), Picard (b), and PCM (c) on Stable Diffusion v1.4 using DDIM. All methods converge to the exact same solution, with PCM requiring 2.8 times less latency than Sequential.
  • ...and 11 more figures

Theorems & Definitions (2)

  • Definition 1: Integral Operator
  • Theorem 1: Picard–Lindelöf Theorem coddington1956theory