PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon
TL;DR
PaGoDA tackles the prohibitive training cost of high-resolution diffusion models by a three-stage approach that first trains on downsampled data, then distills to a one-step generator via DDIM inversion, and finally grows a decoder to upsample to high resolutions. The authors provide theoretical guarantees for optimality and training stability under a reconstruction-loss–plus–adversarial-loss objective, and extend the method with classifier-free guidance for text-conditioned generation. Empirically, PaGoDA achieves state-of-the-art FID on ImageNet across resolutions from $64\times64$ to $512\times512$ without CFG, and demonstrates competitive text-to-image results with CFG, while enabling efficient training on modest hardware. This pipeline promises broader access to high-quality diffusion training and scalable, controllable image generation, with potential integration into latent-diffusion-model pipelines and downstream inversion tasks.
Abstract
The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a $64\times$ reduced cost in training its diffusion model on 8x downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from 64x64 to 512x512, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.
