Table of Contents
Fetching ...

Exploiting Self-Supervised Constraints in Image Super-Resolution

Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

TL;DR

This work tackles the ill-posed nature of single image super-resolution by introducing SSC-SR, a self-supervised constraint framework that leverages a dual asymmetric online/target architecture updated via exponential moving average (EMA) to stabilize learning. It combines a reconstruction loss with a self-supervised consistency loss derived from rotation-based data augmentations and a projection head, with pseudo-targets produced by the EMA-updated network. Empirical results show consistent improvements across diverse SR backbones and datasets, including average PSNR gains around 0.1 dB over EDSR and 0.06 dB over SwinIR, and notable gains on Manga109 and Urban100, validating the approach and its plug-and-play nature. Ablation studies confirm the benefits of EMA, projection-head design, and L1-based self-supervised loss, underscoring SSC-SR’s practical impact for enhancing existing SR methods.

Abstract

Recent advances in self-supervised learning, predominantly studied in high-level visual tasks, have been explored in low-level image processing. This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR. SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential moving average to enhance stability. The proposed SSC-SR framework works as a plug-and-play paradigm and can be easily applied to existing SR models. Empirical evaluations reveal that our SSC-SR framework delivers substantial enhancements on a variety of benchmark datasets, achieving an average increase of 0.1 dB over EDSR and 0.06 dB over SwinIR. In addition, extensive ablation studies corroborate the effectiveness of each constituent in our SSC-SR framework. Codes are available at https://github.com/Aitical/SSCSR.

Exploiting Self-Supervised Constraints in Image Super-Resolution

TL;DR

This work tackles the ill-posed nature of single image super-resolution by introducing SSC-SR, a self-supervised constraint framework that leverages a dual asymmetric online/target architecture updated via exponential moving average (EMA) to stabilize learning. It combines a reconstruction loss with a self-supervised consistency loss derived from rotation-based data augmentations and a projection head, with pseudo-targets produced by the EMA-updated network. Empirical results show consistent improvements across diverse SR backbones and datasets, including average PSNR gains around 0.1 dB over EDSR and 0.06 dB over SwinIR, and notable gains on Manga109 and Urban100, validating the approach and its plug-and-play nature. Ablation studies confirm the benefits of EMA, projection-head design, and L1-based self-supervised loss, underscoring SSC-SR’s practical impact for enhancing existing SR methods.

Abstract

Recent advances in self-supervised learning, predominantly studied in high-level visual tasks, have been explored in low-level image processing. This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR. SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential moving average to enhance stability. The proposed SSC-SR framework works as a plug-and-play paradigm and can be easily applied to existing SR models. Empirical evaluations reveal that our SSC-SR framework delivers substantial enhancements on a variety of benchmark datasets, achieving an average increase of 0.1 dB over EDSR and 0.06 dB over SwinIR. In addition, extensive ablation studies corroborate the effectiveness of each constituent in our SSC-SR framework. Codes are available at https://github.com/Aitical/SSCSR.
Paper Structure (13 sections, 6 equations, 3 figures, 5 tables)

This paper contains 13 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of our proposed method SSC-SR. We adopt the same training framework from the BYOL where there are an online SR network $f_{SR}$, a target SR network $\hat{f}_{SR}$ and an asymmetric projection head $f_{proj}$ are adopted. The target SR network $\hat{f}_{SR}$ is updated via the exponential moving average (EMA) strategy. The online SR network $f_{SR}$ is trained with a pixel-wise loss $\mathcal{L}_{p}$ between the super-resolved image $I^{SR}$ and the ground truth image $I^{HR}$ and an additional consistency loss $\mathcal{L}_{c}$ calculated by the projected image $\tilde{I}^{SR}$ and target image $\hat{I}^{SR}$.
  • Figure 2: Visual comparisons between our enhanced models and their original counterparts are presented. The first row displays the results of existing methods, while the second row showcases the corresponding improvements achieved by our retrained models (Zoom in for more detail).
  • Figure 3: The figure presents a side-by-side visual comparison of the super-resolved images from an online SR network and their counterparts generated by the projection head, concluding a divergence row with a divergence map for comparison.