Table of Contents
Fetching ...

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang

TL;DR

This work addresses the perceptual gap in single-image super-resolution by enhancing SRGAN through a Residual-in-Residual Dense Block architecture without BN, a Relativistic GAN discriminator, and a pre-activation perceptual loss, complemented by network interpolation to balance fidelity and perceptual quality. It demonstrates that these components yield sharper, more natural textures with fewer artifacts, outperforming prior SR methods and achieving top results in the PIRM-SR Challenge. The study also analyzes the impact of training techniques for deep networks and the role of large, diverse datasets, providing practical guidance for achieving high perceptual quality in SR tasks. Overall, ESRGAN offers a robust, highly perceptual, and scalable approach for 4× super-resolution with broad applicability in image restoration and real-world applications.

Abstract

The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at https://github.com/xinntao/ESRGAN .

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

TL;DR

This work addresses the perceptual gap in single-image super-resolution by enhancing SRGAN through a Residual-in-Residual Dense Block architecture without BN, a Relativistic GAN discriminator, and a pre-activation perceptual loss, complemented by network interpolation to balance fidelity and perceptual quality. It demonstrates that these components yield sharper, more natural textures with fewer artifacts, outperforming prior SR methods and achieving top results in the PIRM-SR Challenge. The study also analyzes the impact of training techniques for deep networks and the role of large, diverse datasets, providing practical guidance for achieving high perceptual quality in SR tasks. Overall, ESRGAN offers a robust, highly perceptual, and scalable approach for 4× super-resolution with broad applicability in image restoration and real-world applications.

Abstract

The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at https://github.com/xinntao/ESRGAN .

Paper Structure

This paper contains 20 sections, 4 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: The super-resolution results of $\times 4$ for SRGAN, the proposed ESRGAN and the ground-truth. ESRGAN outperforms SRGAN in sharpness and details.
  • Figure 1: Examples of BN artifacts in PSNR-oriented methods. The BN artifacts are more likely to appear in deeper networks, with BN in HR space and using mismatched dataset whose statistics are different from those of testing dataset.
  • Figure 2: Perception-distortion plane on PIRM self validation dataset. We show the baselines of EDSR lim2017enhanced, RCAN zhang2018image and EnhanceNet sajjadi2017enhancenet, and the submitted ESRGAN model. The blue dots are produced by image interpolation.
  • Figure 2: Examples of BN artifacts in models under the GAN framework.
  • Figure 3: We employ the basic architecture of SRResNet ledig2017photo, where most computation is done in the LR feature space. We could select or design "basic blocks" (e.g., residual block he2016deep, dense block huang2016densely, RRDB) for better performance.
  • ...and 12 more figures