Table of Contents
Fetching ...

WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency

Pranav Jeevan, Neeraj Nixon, Amit Sethi

TL;DR

An enhanced version of the WaveMixSR architecture is presented, replacing the traditional transpose convolution layer with a pixel shuffle operation and implementing a multistage design for higher resolution tasks (4x), achieving state-of-the-art for the BSD100 dataset.

Abstract

Recent advancements in single image super-resolution have been predominantly driven by token mixers and transformer architectures. WaveMixSR utilized the WaveMix architecture, employing a two-dimensional discrete wavelet transform for spatial token mixing, achieving superior performance in super-resolution tasks with remarkable resource efficiency. In this work, we present an enhanced version of the WaveMixSR architecture by (1) replacing the traditional transpose convolution layer with a pixel shuffle operation and (2) implementing a multistage design for higher resolution tasks ($4\times$). Our experiments demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other architectures in multiple super-resolution tasks, achieving state-of-the-art for the BSD100 dataset, while also consuming fewer resources, exhibits higher parameter efficiency, lower latency and higher throughput. Our code is available at https://github.com/pranavphoenix/WaveMixSR.

WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency

TL;DR

An enhanced version of the WaveMixSR architecture is presented, replacing the traditional transpose convolution layer with a pixel shuffle operation and implementing a multistage design for higher resolution tasks (4x), achieving state-of-the-art for the BSD100 dataset.

Abstract

Recent advancements in single image super-resolution have been predominantly driven by token mixers and transformer architectures. WaveMixSR utilized the WaveMix architecture, employing a two-dimensional discrete wavelet transform for spatial token mixing, achieving superior performance in super-resolution tasks with remarkable resource efficiency. In this work, we present an enhanced version of the WaveMixSR architecture by (1) replacing the traditional transpose convolution layer with a pixel shuffle operation and (2) implementing a multistage design for higher resolution tasks (). Our experiments demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other architectures in multiple super-resolution tasks, achieving state-of-the-art for the BSD100 dataset, while also consuming fewer resources, exhibits higher parameter efficiency, lower latency and higher throughput. Our code is available at https://github.com/pranavphoenix/WaveMixSR.
Paper Structure (14 sections, 4 equations, 5 figures, 6 tables)

This paper contains 14 sections, 4 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Comparison of PSNR and SSIM for $2\times$ SR on BSD100 dataset shows WaveMixSR-V2 surpasses the previous state-of-the-art WaveMixSR and other methods such as HAT and SwinFIR. $4\times$ SR results in Appendix.
  • Figure 2: Architecture of WaveMixSR-V2 showing $4\times$ SR with two $2\times$ SR blocks in series. Details in Appendix.
  • Figure 3: Simplified block diagram of WaveMix block in WaveMixSR (on the left) and WaveMixSR-V2 block (on the right). Details in Appendix.
  • Figure 4: Architecture of WaveMixSR-V2. (a) The application of WaveMixSR-V2 for $4\times$ SR is shown featuring two $2\times$ SR blocks stacked in series. For higher SR tasks, more $2\times$ SR blocks will be added. (b) The details of the $2\times$ SR block and (c) shows the WaveMixSR-V2 block that replaces the transposed convolution with a PixelShuffle operation followed by a convolution.
  • Figure 5: Visual results of $2\times$ SR on BSD100 dataset. Each column from the left shows a patch from the HR image (shown as a small image near the corner), the same patch extracted from the LR image, and a patch taken from the model output respectively. The filename of the image is given below the HR image and the PSNR/SSIM of the model output is reported at below the model output. The values displayed are computed for the whole image and not just the patch.