Table of Contents
Fetching ...

Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance

Md Fahim Anjum

TL;DR

This work proposes the integration of alias-free resampling layers into the UNet architecture of diffusion models without adding extra trainable parameters, thereby maintaining computational efficiency and highlighting the potential of theory-driven enhancements such as alias-free resampling in generative models to improve image quality while maintaining model efficiency.

Abstract

Recent advances in image generation, particularly via diffusion models, have led to impressive improvements in image synthesis quality. Despite this, diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity. In this work, we hypothesize that the primary cause of this issue is the improper resampling operation that introduces aliasing in the diffusion model and a careful alias-free resampling dictated by image processing theory can improve the model's performance in image synthesis. We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models without adding extra trainable parameters, thereby maintaining computational efficiency. We then assess whether these theory-driven modifications enhance image quality and rotational equivariance. Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality, particularly in terms of FID and KID scores. Furthermore, we propose a modified diffusion process that enables user-controlled rotation of generated images without requiring additional training. Our findings highlight the potential of theory-driven enhancements such as alias-free resampling in generative models to improve image quality while maintaining model efficiency and pioneer future research directions to incorporate them into video-generating diffusion models, enabling deeper exploration of the applications of alias-free resampling in generative modeling.

Advancing Diffusion Models: Alias-Free Resampling and Enhanced Rotational Equivariance

TL;DR

This work proposes the integration of alias-free resampling layers into the UNet architecture of diffusion models without adding extra trainable parameters, thereby maintaining computational efficiency and highlighting the potential of theory-driven enhancements such as alias-free resampling in generative models to improve image quality while maintaining model efficiency.

Abstract

Recent advances in image generation, particularly via diffusion models, have led to impressive improvements in image synthesis quality. Despite this, diffusion models are still challenged by model-induced artifacts and limited stability in image fidelity. In this work, we hypothesize that the primary cause of this issue is the improper resampling operation that introduces aliasing in the diffusion model and a careful alias-free resampling dictated by image processing theory can improve the model's performance in image synthesis. We propose the integration of alias-free resampling layers into the UNet architecture of diffusion models without adding extra trainable parameters, thereby maintaining computational efficiency. We then assess whether these theory-driven modifications enhance image quality and rotational equivariance. Our experimental results on benchmark datasets, including CIFAR-10, MNIST, and MNIST-M, reveal consistent gains in image quality, particularly in terms of FID and KID scores. Furthermore, we propose a modified diffusion process that enables user-controlled rotation of generated images without requiring additional training. Our findings highlight the potential of theory-driven enhancements such as alias-free resampling in generative models to improve image quality while maintaining model efficiency and pioneer future research directions to incorporate them into video-generating diffusion models, enabling deeper exploration of the applications of alias-free resampling in generative modeling.

Paper Structure

This paper contains 36 sections, 5 equations, 9 figures, 2 tables, 3 algorithms.

Figures (9)

  • Figure 1: Alias-free resampling via anti-aliasing low-pass filters. Panel A shows a $3\times 3$ anti-aliasing filter and its frequency response with Kaiser window ($\beta = 1$). Panel B shows conventional resampling operations ($2\times$ downsampling followed by $2\times$ upsampling) and panel C shows alias-free resampling operations with anti-aliasing filters (downfilter and upfilter steps) and upsampling with interleaved zeros (upsample step).
  • Figure 2: Overview of the conventional baseline UNet (Panel A) and our architectural revisions (Panel B) of the baseline UNet in diffusion models.
  • Figure 3: Improving rotational consistency with modified diffusion process: Panel A shows the classical diffusion process while Panel B illustrates our proposed modified diffusion process, achieving counter-clockwise rotation.
  • Figure 4: Comparison of generated images by diffusion models trained in CIFAR-10 dataset.
  • Figure 5: Comparison of generated images by models trained in CIFAR-10 (top), MNIST-M (middle) and MNIST (bottom) dataset with specific desired rotation.
  • ...and 4 more figures