CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method
Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji
TL;DR
The paper addresses the high cost and complexity of generating high-resolution images from pre-trained low-resolution diffusion models. It introduces CutDiffusion, a tuning-free, two-stage diffusion extrapolation method that splits patch-based extrapolation into comprehensive structure denoising and subsequent detail refinement, employing pixel interaction and pixel relocation. The approach delivers fast, memory-efficient inference with fewer patches and a single upscale step, while achieving strong generation quality compared with both tuning-based and tuning-free baselines. This work lowers the barrier to high-resolution diffusion by enabling cheaper, faster, and more accessible high-resolution image synthesis on consumer hardware, demonstrated on SDXL with thorough ablations and comparisons.
Abstract
Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapolation but cuts a standard patch diffusion process into an initial phase focused on comprehensive structure denoising and a subsequent phase dedicated to specific detail refinement. Comprehensive experiments highlight the numerous almighty advantages of CutDiffusion: (1) simple method construction that enables a concise higher-resolution diffusion process without third-party engagement; (2) fast inference speed achieved through a single-step higher-resolution diffusion process, and fewer inference patches required; (3) cheap GPU cost resulting from patch-wise inference and fewer patches during the comprehensive structure denoising; (4) strong generation performance, stemming from the emphasis on specific detail refinement.
