Table of Contents
Fetching ...

Fortifying Fully Convolutional Generative Adversarial Networks for Image Super-Resolution Using Divergence Measures

Arkaprabha Basu, Kushal Bose, Sankha Subhra Mullick, Anish Chakrabarty, Swagatam Das

TL;DR

This paper addresses 4x image super-resolution by introducing SuRGe, a fully-convolutional GAN that preserves and adaptive-mixes features from multiple depths of the generator. It uniquely integrates Jensen–Shannon divergence $L^{G}_{JS}$ and Gromov-Wasserstein distance $L^{G}_{GW}$ as auxiliary objectives to align SR with HR and with LR–SR distributions, while the discriminator is trained with Wasserstein loss and gradient penalty to curb mode collapse. The generator loss is dynamically formed as a Softmax-weighted combination of adversarial, JS, and GW terms, enabling balanced optimization, and the architecture employs learnable convex feature mixing $F_{0}$ and $F_{1}$ with two-stage 2x upscaling and nearest-neighbor upsampling to reduce artifacts. Empirical results on DIV2K and 10 benchmarks demonstrate state-of-the-art PSNR/SSIM improvements across diverse datasets, with favorable inference time and parameter efficiency. The work highlights the potential of explicit distributional divergences in guiding SR and suggests avenues for robustness and extension to other scaling factors.

Abstract

Super-Resolution (SR) is a time-hallowed image processing problem that aims to improve the quality of a Low-Resolution (LR) sample up to the standard of its High-Resolution (HR) counterpart. We aim to address this by introducing Super-Resolution Generator (SuRGe), a fully-convolutional Generative Adversarial Network (GAN)-based architecture for SR. We show that distinct convolutional features obtained at increasing depths of a GAN generator can be optimally combined by a set of learnable convex weights to improve the quality of generated SR samples. In the process, we employ the Jensen-Shannon and the Gromov-Wasserstein losses respectively between the SR-HR and LR-SR pairs of distributions to further aid the generator of SuRGe to better exploit the available information in an attempt to improve SR. Moreover, we train the discriminator of SuRGe with the Wasserstein loss with gradient penalty, to primarily prevent mode collapse. The proposed SuRGe, as an end-to-end GAN workflow tailor-made for super-resolution, offers improved performance while maintaining low inference time. The efficacy of SuRGe is substantiated by its superior performance compared to 18 state-of-the-art contenders on 10 benchmark datasets.

Fortifying Fully Convolutional Generative Adversarial Networks for Image Super-Resolution Using Divergence Measures

TL;DR

This paper addresses 4x image super-resolution by introducing SuRGe, a fully-convolutional GAN that preserves and adaptive-mixes features from multiple depths of the generator. It uniquely integrates Jensen–Shannon divergence and Gromov-Wasserstein distance as auxiliary objectives to align SR with HR and with LR–SR distributions, while the discriminator is trained with Wasserstein loss and gradient penalty to curb mode collapse. The generator loss is dynamically formed as a Softmax-weighted combination of adversarial, JS, and GW terms, enabling balanced optimization, and the architecture employs learnable convex feature mixing and with two-stage 2x upscaling and nearest-neighbor upsampling to reduce artifacts. Empirical results on DIV2K and 10 benchmarks demonstrate state-of-the-art PSNR/SSIM improvements across diverse datasets, with favorable inference time and parameter efficiency. The work highlights the potential of explicit distributional divergences in guiding SR and suggests avenues for robustness and extension to other scaling factors.

Abstract

Super-Resolution (SR) is a time-hallowed image processing problem that aims to improve the quality of a Low-Resolution (LR) sample up to the standard of its High-Resolution (HR) counterpart. We aim to address this by introducing Super-Resolution Generator (SuRGe), a fully-convolutional Generative Adversarial Network (GAN)-based architecture for SR. We show that distinct convolutional features obtained at increasing depths of a GAN generator can be optimally combined by a set of learnable convex weights to improve the quality of generated SR samples. In the process, we employ the Jensen-Shannon and the Gromov-Wasserstein losses respectively between the SR-HR and LR-SR pairs of distributions to further aid the generator of SuRGe to better exploit the available information in an attempt to improve SR. Moreover, we train the discriminator of SuRGe with the Wasserstein loss with gradient penalty, to primarily prevent mode collapse. The proposed SuRGe, as an end-to-end GAN workflow tailor-made for super-resolution, offers improved performance while maintaining low inference time. The efficacy of SuRGe is substantiated by its superior performance compared to 18 state-of-the-art contenders on 10 benchmark datasets.
Paper Structure (22 sections, 9 equations, 11 figures, 9 tables, 1 algorithm)

This paper contains 22 sections, 9 equations, 11 figures, 9 tables, 1 algorithm.

Figures (11)

  • Figure 1: Visual comparison of 4x super-resolution outputs of the proposed SuRGe with SRGAN srgan_cvpr2017, BSRGAN gu2019bsrgan, SWIN-IR swiniriccvw2021, and LTE lee2022lte, given a low-resolution (LR) input image patch. SuRGe is producing better super-resolution images with finer texture, color, and intricate details.
  • Figure 2: The schematic of SuRGe demonstrates two of its main components in (a) the generator $G$ and (b) the discriminator $D$. Moreover, in (c), we detail the structure of our sub-network Repetitive Residual Block used in $G$ and $D$. $G$ takes a LR image $\mathbf{x}$ and generates a 4x up-scaled SR image $G(\mathbf{x})$. $D$ guides $G$ by distinguishing an input between HR ground truth $\mathbf{y}$ and SR $G(\mathbf{x})$. Further details on network design can be found in the \ref{['app:sec:networkDetails']}.
  • Figure 3: In comparison to HR patch (a) of a butterfly image, the checkerboard pattern introduced by PixelShuffle checkerboardartifact_pixelshuffle is apparent in (b). Nearest neighbor up-scaling in SuRGe generates clean $G(\mathbf{x})$ as evident from our result in (c).
  • Figure 4: We extract patch from HR as $\mathbf{y}$ and down-scale it to LR input $\mathbf{x}$. The $\mathbf{x}$ is fed to $G$ to obtain the 4x SR $G(\mathbf{x})$. The $G(\mathbf{x})$ is used for $\mathcal{L}^{G}_{a}$ using equation (\ref{['eq:advloss']}), $G(\mathbf{x})$ and $\mathbf{y}$ together is used for $\mathcal{L}^{G}_{\textrm{JS}}$ using equation (\ref{['eq:jsLoss']}), and $\mathbf{x}$ with $G(\mathbf{x})$ find $\mathcal{L}^{G}_{\textrm{GW}}$ using equation (\ref{['eq:gwLoss']}). We take the Softmax-based dynamic convex combination of $\mathcal{L}^{G}_{a}$, $\mathcal{L}^{G}_{\textrm{JS}}$, and $\mathcal{L}^{G}_{\textrm{GW}}$ as per equation (\ref{['eq:dynLoss']}) to find $\mathcal{L}^{G}$ to update $G$. For updating $D$, we use $\mathbf{y}$ and $G(\mathbf{x})$ to calculate $\mathcal{L}^{D}$ using equation (\ref{['eqn:DLoss']}).
  • Figure 5: We show the generated SR patch for a butterfly (Set5) test instance along with the metrics (on top as PSNR/SSIM) in the intervals of every 50 training epochs of SuRGe. The gradual improvement in the SR output of SuRGe is apparent with the progress in training.
  • ...and 6 more figures