Table of Contents
Fetching ...

Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

Zhilu Zhang, Ruohao Wang, Hongzhi Zhang, Wangmeng Zuo

TL;DR

This work addresses real-world RefSR under dual and multiple smartphone zooms without ground-truth HR data. It introduces SelfDZSR++ and its TZSR extension, which learn SR of the ultra-wide image by supervising against the telephoto GT using center-cropped patches, while mitigating misalignment through patch-based optical-flow alignment and an auxiliary-LR guided deformable alignment, and optimizing with local overlapped sliced Wasserstein loss. The method includes a Ref–LR alignment strategy, an EDSR-based restoration module with Ref-conditioned modulation, and a progressive fusion scheme for TZSR to leverage multiple references. Experiments on Nikon and iPhone datasets demonstrate state-of-the-art quantitative and perceptual results with lightweight models, highlighting practical applicability for real-world smartphone imaging.

Abstract

In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple cameras in modern smartphones, the more zoomed (telephoto) image can be naturally leveraged as the reference to guide the super-resolution (SR) of the lesser zoomed (ultra-wide) image, which gives us a chance to learn a deep network that performs SR from the dual zoomed observations (DZSR). Secondly, for self-supervised learning of DZSR, we take the telephoto image instead of an additional high-resolution image as the supervision information, and select a center patch from it as the reference to super-resolve the corresponding ultra-wide image patch. To mitigate the effect of the misalignment between ultra-wide low-resolution (LR) patch and telephoto ground-truth (GT) image during training, we first adopt patch-based optical flow alignment and then design an auxiliary-LR to guide the deforming of the warped LR features. To generate visually pleasing results, we present local overlapped sliced Wasserstein loss to better represent the perceptual difference between GT and output in the feature space. During testing, DZSR can be directly deployed to super-solve the whole ultra-wide image with the reference of the telephoto image. In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images. Experiments show that our methods achieve better quantitative and qualitative performance against state-of-the-arts. Codes are available at https://github.com/cszhilu1998/SelfDZSR_PlusPlus.

Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

TL;DR

This work addresses real-world RefSR under dual and multiple smartphone zooms without ground-truth HR data. It introduces SelfDZSR++ and its TZSR extension, which learn SR of the ultra-wide image by supervising against the telephoto GT using center-cropped patches, while mitigating misalignment through patch-based optical-flow alignment and an auxiliary-LR guided deformable alignment, and optimizing with local overlapped sliced Wasserstein loss. The method includes a Ref–LR alignment strategy, an EDSR-based restoration module with Ref-conditioned modulation, and a progressive fusion scheme for TZSR to leverage multiple references. Experiments on Nikon and iPhone datasets demonstrate state-of-the-art quantitative and perceptual results with lightweight models, highlighting practical applicability for real-world smartphone imaging.

Abstract

In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms. Firstly, considering the popularity of multiple cameras in modern smartphones, the more zoomed (telephoto) image can be naturally leveraged as the reference to guide the super-resolution (SR) of the lesser zoomed (ultra-wide) image, which gives us a chance to learn a deep network that performs SR from the dual zoomed observations (DZSR). Secondly, for self-supervised learning of DZSR, we take the telephoto image instead of an additional high-resolution image as the supervision information, and select a center patch from it as the reference to super-resolve the corresponding ultra-wide image patch. To mitigate the effect of the misalignment between ultra-wide low-resolution (LR) patch and telephoto ground-truth (GT) image during training, we first adopt patch-based optical flow alignment and then design an auxiliary-LR to guide the deforming of the warped LR features. To generate visually pleasing results, we present local overlapped sliced Wasserstein loss to better represent the perceptual difference between GT and output in the feature space. During testing, DZSR can be directly deployed to super-solve the whole ultra-wide image with the reference of the telephoto image. In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images. Experiments show that our methods achieve better quantitative and qualitative performance against state-of-the-arts. Codes are available at https://github.com/cszhilu1998/SelfDZSR_PlusPlus.
Paper Structure (26 sections, 12 equations, 12 figures, 12 tables, 1 algorithm)

This paper contains 26 sections, 12 equations, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overall pipeline of the classic RefSR, DZSR, and TZSR during inference. $r_w$ and $r_t$ respectively represent the wide-angle and telephoto resolution multiple relative to the ultra-wide image.
  • Figure 2: Overall pipeline of proposed SelfDZSR++. (a) SelfDZSR++ in the training phase. The original telephoto image is taken as GT ($\mathbf{t}$), while the center areas of ultra-wide ($\mathbf{u}$) and telephoto ($\mathbf{t}$) images are regarded as LR ($\mathbf{u_c}$) and Ref ($\mathbf{t_c}$) images, respectively. (b) DZSR in the training phase. The auxiliary-LR ($\mathbf{\tilde{u}_c^w}$) is aligned with GT and used for deforming the warped LR ($\mathbf{u_c^w}$) towards the GT by AdaSTN. Then aligned LR and Ref features are fed into the restoration module. (c) DZSR in the testing phase. The ultra-wide ($\mathbf{u}$) and telephoto ($\mathbf{t}$) images can be regarded as LR and Ref, respectively. Patch-based optical flow alignment and auxiliary-LR generator are detached. AdaSTN is simplified to a convolution layer.
  • Figure 3: Illustration of the auxiliary-LR generator. The position preserving loss constraints the kernel weight to ensure the alignment between auxiliary-LR and GT, while content preserving loss constraints that auxiliary-LR has similar contents and degradations as LR.
  • Figure 4: Illustration of AdaSTN. The offset estimator predicts the position offsets between the warped LR and auxiliary-LR, then deformable convolution is used to deform warped LR according to the offsets.
  • Figure 5: Alignment between Ref and auxiliary-LR.
  • ...and 7 more figures