Table of Contents
Fetching ...

Implicit Grid Convolution for Multi-Scale Image Super-Resolution

Dongheon Lee, Seokju Yun, Youngmin Ro

TL;DR

This paper proposes a multi-scale framework that employs a single encoder in conjunction with Implicit Grid Convolution (IGConv), the authors' novel upsampler, which unifies SPConv across all scales within a single module and achieves comparable performance to existing fixed-scale methods.

Abstract

For Image Super-Resolution (SR), it is common to train and evaluate scale-specific models composed of an encoder and upsampler for each targeted scale. Consequently, many SR studies encounter substantial training times and complex deployment requirements. In this paper, we address this limitation by training and evaluating multiple scales simultaneously. Notably, we observe that encoder features are similar across scales and that the Sub-Pixel Convolution (SPConv), widely-used scale-specific upsampler, exhibits strong inter-scale correlations in its functionality. Building on these insights, we propose a multi-scale framework that employs a single encoder in conjunction with Implicit Grid Convolution (IGConv), our novel upsampler, which unifies SPConv across all scales within a single module. Extensive experiments demonstrate that our framework achieves comparable performance to existing fixed-scale methods while reducing the training budget and stored parameters three-fold and maintaining the same latency. Additionally, we propose IGConv$^{+}$ to improve performance further by addressing spectral bias and allowing input-dependent upsampling and ensembled prediction. As a result, ATD-IGConv$^{+}$ achieves a notable 0.21dB improvement in PSNR on Urban100$\times$4, while also reducing the training budget, stored parameters, and inference cost compared to the existing ATD.

Implicit Grid Convolution for Multi-Scale Image Super-Resolution

TL;DR

This paper proposes a multi-scale framework that employs a single encoder in conjunction with Implicit Grid Convolution (IGConv), the authors' novel upsampler, which unifies SPConv across all scales within a single module and achieves comparable performance to existing fixed-scale methods.

Abstract

For Image Super-Resolution (SR), it is common to train and evaluate scale-specific models composed of an encoder and upsampler for each targeted scale. Consequently, many SR studies encounter substantial training times and complex deployment requirements. In this paper, we address this limitation by training and evaluating multiple scales simultaneously. Notably, we observe that encoder features are similar across scales and that the Sub-Pixel Convolution (SPConv), widely-used scale-specific upsampler, exhibits strong inter-scale correlations in its functionality. Building on these insights, we propose a multi-scale framework that employs a single encoder in conjunction with Implicit Grid Convolution (IGConv), our novel upsampler, which unifies SPConv across all scales within a single module. Extensive experiments demonstrate that our framework achieves comparable performance to existing fixed-scale methods while reducing the training budget and stored parameters three-fold and maintaining the same latency. Additionally, we propose IGConv to improve performance further by addressing spectral bias and allowing input-dependent upsampling and ensembled prediction. As a result, ATD-IGConv achieves a notable 0.21dB improvement in PSNR on Urban1004, while also reducing the training budget, stored parameters, and inference cost compared to the existing ATD.
Paper Structure (26 sections, 9 equations, 9 figures, 12 tables)

This paper contains 26 sections, 9 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Efficiency and performance comparison on existing upsampler (SPConv and SPConv$^{+}$) with our proposals (IGConv and IGConv$^{+}$) on various metrics and models. Efficiency metrics are measured by reconstructing an HD (1280$\times$720) image on an A6000 GPU after instantiating our proposals on a $\times2$ scale.
  • Figure 2: The structure of SR models. (a) illustrates the classic fixed-scale SR methods employing SPConv and SPConv $^{+}$, while (b) illustrates our multi-scale frameworks employing IGConv, and IGConv$^{+}$. Our proposed methods comprise the hyper-network to generate convolution filters based on scale and employ the IGSample as a sub-module for efficient input-dependent upsampling. FGRep is employed to improve performance by performing ensemble prediction with a single forward pass.
  • Figure 3: Visualization of CKA similarity CKA between feature maps at scale $\times$2, $\times$3, and $\times$4 varying layers of SMFANet$+$SMFANet, HiT-SRF HiTSR, and MambaIR MambaIR. CKA similarity demonstrates that feature maps at different scales become increasingly similar as they approach the later layer.
  • Figure 4: Visualization of SPConv for scales 4 and 2. Although the SPConvs at different scales employ different numbers of filters, the filtered sub-pixels for all scales exhibit significant 2D spatial correlations (illustrated with color gradients) due to the subsequent $\mathcal{DS}$. Visualized convolution filters trained to capture inter-scale correlations are shown in Figure \ref{['fig:implicitgrids']}.
  • Figure 5: Visualizations of 12 convolution filters in front inferred by $\mathcal{H}$ of RDN-IGConv$^{+}$ for scales $\times$2, $\times$3, $\times$4, and $\times$32. More visualizations are provided in the Appendix \ref{['sec:igvis_full']}.
  • ...and 4 more figures