Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
NVIDIA, :, Yuval Atzmon, Maciej Bala, Yogesh Balaji, Tiffany Cai, Yin Cui, Jiaojiao Fan, Yunhao Ge, Siddharth Gururani, Jacob Huffman, Ronald Isaac, Pooya Jannaty, Tero Karras, Grace Lam, J. P. Lewis, Aaron Licata, Yen-Chen Lin, Ming-Yu Liu, Qianli Ma, Arun Mallya, Ashlee Martino-Tarr, Doug Mendez, Seungjun Nah, Chris Pruett, Fitsum Reda, Jiaming Song, Ting-Chun Wang, Fangyin Wei, Xiaohui Zeng, Yu Zeng, Qinsheng Zhang
TL;DR
Edify Image presents a novel pixel-space diffusion framework based on multi-scale Laplacian decomposition to enable high-fidelity, controllable image generation at 1K and 4K resolutions. By introducing a dimension-varying diffusion process and a two-stage cascaded architecture (256-base and 1K-upsampler), the method delivers photorealistic outputs with long prompts, diverse aspect ratios, and camera controls. The work further extends capabilities to 4K upsampling, ControlNet-augmented conditioning, 360° HDR panorama generation, and lightweight finetuning for personalization, achieving compatibility with pre-trained ControlNets and demonstrating fairness and style-transfer across subjects and styles. Collectively, Edify Image offers scalable, controllable image synthesis across multiple applications, including panoramic HDR, customization, and high-resolution upsampling, with practical implications for content creation and synthetic data generation.
Abstract
We introduce Edify Image, a family of diffusion models capable of generating photorealistic image content with pixel-perfect accuracy. Edify Image utilizes cascaded pixel-space diffusion models trained using a novel Laplacian diffusion process, in which image signals at different frequency bands are attenuated at varying rates. Edify Image supports a wide range of applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for image customization.
