PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
Junhyuk So, Jiwoong Shin, Chaeyeon Jang, Eunhyeok Park
TL;DR
The paper tackles slow diffusion-model sampling by introducing the Picard Consistency Model (PCM), which trains a diffusion predictor to output the fixed-point solution $X^*$ from intermediate Picard trajectory states, enabling accelerated parallel sampling. A key innovation is model switching, which blends PCM with the base model to preserve exact convergence while achieving speedups; EMA stabilization and LoRA-based weight-space switching further enhance training stability and efficiency. Empirical results across image generation and robotic control show PCM delivers up to about 2.71x speedups relative to sequential denoising and about 1.77x relative to standard Picard iteration, without sacrificing output fidelity. The work positions PCM as a distillation-free, convergence-preserving acceleration method that leverages consistent fixed-point prediction and flexible switching strategies for practical, high-throughput diffusion inference.
Abstract
Recently, diffusion models have achieved significant advances in vision, text, and robotics. However, they still face slow generation speeds due to sequential denoising processes. To address this, a parallel sampling method based on Picard iteration was introduced, effectively reducing sequential steps while ensuring exact convergence to the original output. Nonetheless, Picard iteration does not guarantee faster convergence, which can still result in slow generation in practice. In this work, we propose a new parallelization scheme, the Picard Consistency Model (PCM), which significantly reduces the number of generation steps in Picard iteration. Inspired by the consistency model, PCM is directly trained to predict the fixed-point solution, or the final output, at any stage of the convergence trajectory. Additionally, we introduce a new concept called model switching, which addresses PCM's limitations and ensures exact convergence. Extensive experiments demonstrate that PCM achieves up to a 2.71x speedup over sequential sampling and a 1.77x speedup over Picard iteration across various tasks, including image generation and robotic control.
