UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks
Jingjing Ren, Wenbo Li, Haoyu Chen, Renjing Pei, Bin Shao, Yong Guo, Long Peng, Fenglong Song, Lei Zhu
TL;DR
UltraPixel addresses the challenge of ultra-high-resolution image generation by integrating a cascade diffusion framework with low-resolution semantic guidance. It learns implicit representations to continuously upsample guidance and uses scale-aware normalization to support multiple resolutions within a shared, compact latent space, achieving 1K–6K outputs efficiently. The approach yields state-of-the-art or competitive perceptual metrics and faster inference compared to several baselines, while requiring modest training data. The work also demonstrates potential for controllable generation and personalization, albeit with attention to dataset quality and responsible use.
Abstract
Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions (\textit{e.g.}, 1K to 6K) within a single model, while maintaining computational efficiency. UltraPixel leverages semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images, significantly reducing complexity. Furthermore, we introduce implicit neural representations for continuous upsampling and scale-aware normalization layers adaptable to various resolutions. Notably, both low- and high-resolution processes are performed in the most compact space, sharing the majority of parameters with less than 3$\%$ additional parameters for high-resolution outputs, largely enhancing training and inference efficiency. Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images and demonstrating state-of-the-art performance in extensive experiments.
