Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Jinho Jeong, Sangmin Han, Jinwoo Kim, Seon Joo Kim
TL;DR
This work tackles the challenge of generating very high-resolution images with diffusion models by addressing two key bottlenecks: manifold deviation during latent-space upsampling and insufficient texture in RGB upsampling. It introduces Latent Space Super-Resolution (LSR) to align low- and high-resolution latent manifolds and Region-wise Noise Addition (RNA) to inject detail in high-frequency regions, forming the LSRNA framework. Empirical results demonstrate that LSRNA improves both latent- and RGB-based reference methods (e.g., DemoFusion and Pixelsmith), achieving state-of-the-art scores across multiple resolutions with faster inference due to reduced denoising steps. The approach advances practical high-resolution diffusion-based generation and offers robust, edge-guided texture enhancement for real-world, megapixel-scale outputs.
Abstract
In this paper, we propose LSRNA, a novel framework for higher-resolution (exceeding 1K) image generation using diffusion models by leveraging super-resolution directly in the latent space. Existing diffusion models struggle with scaling beyond their training resolutions, often leading to structural distortions or content repetition. Reference-based methods address the issues by upsampling a low-resolution reference to guide higher-resolution generation. However, they face significant challenges: upsampling in latent space often causes manifold deviation, which degrades output quality. On the other hand, upsampling in RGB space tends to produce overly smoothed outputs. To overcome these limitations, LSRNA combines Latent space Super-Resolution (LSR) for manifold alignment and Region-wise Noise Addition (RNA) to enhance high-frequency details. Our extensive experiments demonstrate that integrating LSRNA outperforms state-of-the-art reference-based methods across various resolutions and metrics, while showing the critical role of latent space upsampling in preserving detail and sharpness. The code is available at https://github.com/3587jjh/LSRNA.
