Table of Contents
Fetching ...

NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation

Ruozhen He, Moayed Haji-Ali, Ziyan Yang, Vicente Ordonez

TL;DR

NoiseShift is proposed, a training-free method that recalibrates the noise level of the denoiser conditioned on resolution size that is compatible with existing models and demonstrates the effectiveness of NoiseShift in mitigating resolution-dependent artifacts and enhancing the quality of low-resolution image generation.

Abstract

Text-to-image diffusion models trained on a fixed set of resolutions often fail to generalize, even when asked to generate images at lower resolutions than those seen during training. High-resolution text-to-image generators are currently unable to easily offer an out-of-the-box budget-efficient alternative to their users who might not need high-resolution images. We identify a key technical insight in diffusion models that when addressed can help tackle this limitation: Noise schedulers have unequal perceptual effects across resolutions. The same level of noise removes disproportionately more signal from lower-resolution images than from high-resolution images, leading to a train-test mismatch. We propose NoiseShift, a training-free method that recalibrates the noise level of the denoiser conditioned on resolution size. NoiseShift requires no changes to model architecture or sampling schedule and is compatible with existing models. When applied to Stable Diffusion 3, Stable Diffusion 3.5, and Flux-Dev, quality at low resolutions is significantly improved. On LAION-COCO, NoiseShift improves SD3.5 by 15.89%, SD3 by 8.56%, and Flux-Dev by 2.44% in FID on average. On CelebA, NoiseShift improves SD3.5 by 10.36%, SD3 by 5.19%, and Flux-Dev by 3.02% in FID on average. These results demonstrate the effectiveness of NoiseShift in mitigating resolution-dependent artifacts and enhancing the quality of low-resolution image generation.

NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation

TL;DR

NoiseShift is proposed, a training-free method that recalibrates the noise level of the denoiser conditioned on resolution size that is compatible with existing models and demonstrates the effectiveness of NoiseShift in mitigating resolution-dependent artifacts and enhancing the quality of low-resolution image generation.

Abstract

Text-to-image diffusion models trained on a fixed set of resolutions often fail to generalize, even when asked to generate images at lower resolutions than those seen during training. High-resolution text-to-image generators are currently unable to easily offer an out-of-the-box budget-efficient alternative to their users who might not need high-resolution images. We identify a key technical insight in diffusion models that when addressed can help tackle this limitation: Noise schedulers have unequal perceptual effects across resolutions. The same level of noise removes disproportionately more signal from lower-resolution images than from high-resolution images, leading to a train-test mismatch. We propose NoiseShift, a training-free method that recalibrates the noise level of the denoiser conditioned on resolution size. NoiseShift requires no changes to model architecture or sampling schedule and is compatible with existing models. When applied to Stable Diffusion 3, Stable Diffusion 3.5, and Flux-Dev, quality at low resolutions is significantly improved. On LAION-COCO, NoiseShift improves SD3.5 by 15.89%, SD3 by 8.56%, and Flux-Dev by 2.44% in FID on average. On CelebA, NoiseShift improves SD3.5 by 10.36%, SD3 by 5.19%, and Flux-Dev by 3.02% in FID on average. These results demonstrate the effectiveness of NoiseShift in mitigating resolution-dependent artifacts and enhancing the quality of low-resolution image generation.

Paper Structure

This paper contains 29 sections, 6 equations, 11 figures, 3 tables, 2 algorithms.

Figures (11)

  • Figure 3: Qualitative comparison of Flux-Dev. Generated image examples before and after applying NoiseShift are on CelebA (left) and LAION-COCO (right).
  • Figure 4: Calibrated conditioning noise levels across resolutions. We plot the default sampling noise schedule (gray) alongside the resolution-specific calibrated conditioning $\hat{\sigma}_t$ for SD3 (left) and SD3.5 (right). At the default resolution (1024$\times$1024), the curves align closely. At lower resolutions, the optimal $\hat{\sigma}_t$ curves consistently deviate upward, reflecting a need for stronger conditioning to compensate for perceptual degradation.
  • Figure 5: Ablation studies. Ablation studies on the number of samples used during calibration and the new sigmas obtained at 128$\times$128 and 256$\times$256.
  • Figure 6: Qualitative comparison of SD3.5. Generated image examples before and after applying NoiseShift are on CelebA (top) and LAION-COCO (bottom).
  • Figure 7: Qualitative comparison of SD3.5. Generated image examples before and after applying NoiseShift are on CelebA (top) and LAION-COCO (bottom).
  • ...and 6 more figures