Table of Contents
Fetching ...

Does Diffusion Beat GAN in Image Super Resolution?

Denis Kuznedelev, Valerii Startsev, Daniil Shlenskii, Sergey Kastryulin

TL;DR

It is shown that a GAN-based model can achieve results comparable or superior to a diffusion-based model in the ISR problem, and the impact of popular design choices, such as text conditioning and augmentation on the performance of ISR models is explored.

Abstract

There is a prevalent opinion that diffusion-based models outperform GAN-based counterparts in the Image Super Resolution (ISR) problem. However, in most studies, diffusion-based ISR models employ larger networks and are trained longer than the GAN baselines. This raises the question of whether the high performance stems from the superiority of the diffusion paradigm or if it is a consequence of the increased scale and the greater computational resources of the contemporary studies. In our work, we thoroughly compare diffusion-based and GAN-based Super Resolution models under controlled settings, with both approaches having matched architecture, model and dataset sizes, and computational budget. We show that a GAN-based model can achieve results comparable or superior to a diffusion-based model. Additionally, we explore the impact of popular design choices, such as text conditioning and augmentation on the performance of ISR models, showcasing their effect in several downstream tasks. We will release the inference code and weights of our scaled GAN.

Does Diffusion Beat GAN in Image Super Resolution?

TL;DR

It is shown that a GAN-based model can achieve results comparable or superior to a diffusion-based model in the ISR problem, and the impact of popular design choices, such as text conditioning and augmentation on the performance of ISR models is explored.

Abstract

There is a prevalent opinion that diffusion-based models outperform GAN-based counterparts in the Image Super Resolution (ISR) problem. However, in most studies, diffusion-based ISR models employ larger networks and are trained longer than the GAN baselines. This raises the question of whether the high performance stems from the superiority of the diffusion paradigm or if it is a consequence of the increased scale and the greater computational resources of the contemporary studies. In our work, we thoroughly compare diffusion-based and GAN-based Super Resolution models under controlled settings, with both approaches having matched architecture, model and dataset sizes, and computational budget. We show that a GAN-based model can achieve results comparable or superior to a diffusion-based model. Additionally, we explore the impact of popular design choices, such as text conditioning and augmentation on the performance of ISR models, showcasing their effect in several downstream tasks. We will release the inference code and weights of our scaled GAN.
Paper Structure (52 sections, 10 equations, 14 figures, 13 tables)

This paper contains 52 sections, 10 equations, 14 figures, 13 tables.

Figures (14)

  • Figure 1: SbS comparison between two subsequent checkpoints showcases faster convergence of GAN models. Green corresponds to statistical improvement on the current step, grey to equality. Three evaluations without improvement in a row indicate convergence.
  • Figure 2: Visual comparison between GAN and diffusion SR model from our work and the baselines on SR($\times 4$). Zoom in for the best view.
  • Figure 3: SbS comparison between text-conditional (based on a proprietary model called XL and UMT5) and unconditional SR models. Bar plots show that additional text-conditioning does not noticeably improve perceived image quality
  • Figure 4: SbS comparison between two subsequent checkpoints. Green corresponds to statistical improvement, grey to equality. Three evaluations without improvement in a row indicate convergence.
  • Figure 5: Visual comparison between GAN and diffusion SR model from our work and the baselines on SR($\times 4$) + degradations. Zoom in for the best view.
  • ...and 9 more figures