Table of Contents
Fetching ...

Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution

Yuehan Zhang, Seungjun Lee, Angela Yao

TL;DR

This work tackles unsupervised real-world image super-resolution under unknown degradations by introducing Pairwise Distance Distillation (PDD), which jointly leverages a specialist trained on synthetic degradations and a generalist trained on broader degradations. PDD enforces two forms of distance consistency in $VGG$ feature space: intra-model distances between a model's predictions on real-world and synthetic inputs, and inter-model distances between the specialist and generalist across domains, using both feature and Gram-matrix statistics. The method optimizes a hybrid loss combining supervised terms on synthetic data and unsupervised distillation terms on real-world data, with two initialization schemes (static and EMA) that affect performance. Empirical results on RealSR, DRealSR, and NTIRE20 show that PDD improves fidelity and perceptual quality, often outperforming state-of-the-art RWSR methods, and are corroborated by user studies; code is provided for reproducibility.

Abstract

Standard single-image super-resolution creates paired training data from high-resolution images through fixed downsampling kernels. However, real-world super-resolution (RWSR) faces unknown degradations in the low-resolution inputs, all the while lacking paired training data. Existing methods approach this problem by learning blind general models through complex synthetic augmentations on training inputs; they sacrifice the performance on specific degradation for broader generalization to many possible ones. We address the unsupervised RWSR for a targeted real-world degradation. We study from a distillation perspective and introduce a novel pairwise distance distillation framework. Through our framework, a model specialized in synthetic degradation adapts to target real-world degradations by distilling intra- and inter-model distances across the specialized model and an auxiliary generalized model. Experiments on diverse datasets demonstrate that our method significantly enhances fidelity and perceptual quality, surpassing state-of-the-art approaches in RWSR. The source code is available at https://github.com/Yuehan717/PDD.

Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution

TL;DR

This work tackles unsupervised real-world image super-resolution under unknown degradations by introducing Pairwise Distance Distillation (PDD), which jointly leverages a specialist trained on synthetic degradations and a generalist trained on broader degradations. PDD enforces two forms of distance consistency in feature space: intra-model distances between a model's predictions on real-world and synthetic inputs, and inter-model distances between the specialist and generalist across domains, using both feature and Gram-matrix statistics. The method optimizes a hybrid loss combining supervised terms on synthetic data and unsupervised distillation terms on real-world data, with two initialization schemes (static and EMA) that affect performance. Empirical results on RealSR, DRealSR, and NTIRE20 show that PDD improves fidelity and perceptual quality, often outperforming state-of-the-art RWSR methods, and are corroborated by user studies; code is provided for reproducibility.

Abstract

Standard single-image super-resolution creates paired training data from high-resolution images through fixed downsampling kernels. However, real-world super-resolution (RWSR) faces unknown degradations in the low-resolution inputs, all the while lacking paired training data. Existing methods approach this problem by learning blind general models through complex synthetic augmentations on training inputs; they sacrifice the performance on specific degradation for broader generalization to many possible ones. We address the unsupervised RWSR for a targeted real-world degradation. We study from a distillation perspective and introduce a novel pairwise distance distillation framework. Through our framework, a model specialized in synthetic degradation adapts to target real-world degradations by distilling intra- and inter-model distances across the specialized model and an auxiliary generalized model. Experiments on diverse datasets demonstrate that our method significantly enhances fidelity and perceptual quality, surpassing state-of-the-art approaches in RWSR. The source code is available at https://github.com/Yuehan717/PDD.
Paper Structure (14 sections, 11 equations, 9 figures, 3 tables)

This paper contains 14 sections, 11 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: $\hat{Y}_{G}^{U}$ and $\hat{Y}_{G}^{L}$ are reconstructions for real-world and bicubic-interpolated (BI) inputs using a blind generalized model (generalist); $\hat{Y}_{S}^{U}$ and $\hat{Y}_{S}^{L}$ are counterparts using a standard SR model (bicubic specialist). The specialist enhances the BI inputs with clearer details, while the generalist does better for the real-world one. We distill the intra- and inter-model distances for an improved real-world reconstruction $\hat{Y}_{S^*}^{U}$.
  • Figure 2: Schematic of Pairwise Distance Distillation (PDD). $X^U$ and $X^L$ are unlabeled (real-world) and labeled (synthetic) inputs. Model $M_G$ is a generalist trained with extensive synthetic pipeline, while $M_S$ specializes in $X^L$. The prediction $\hat{Y}_{S}^{L}$ is supervised by its ground truth throughout training. PDD enforces the consistency between $\{\Delta_U,\:\Delta_L\}$ and between $\{d_S,\:d_G\}$ to improve $M_S$'s real-world performance.
  • Figure 3: Two initialization options for the generalist and specialist models. (a) The static configuration initializes $M_G$ with a model pre-trained by the complex synthetic pipeline and $M_S$ with a model pre-trained by simple degradation in $X^L$. Weights of $M_G$ are frozen during distillation. (b) Both $M_S$ and $M_G$ are initialized with a pre-trained generalized model. Weights of $M_G$ are the EMA version of $M_S$.
  • Figure 4: Improvements over the Generalist on five unlabeled data domains, where a lower LPIPS score is better. ND improves fidelity scores (PSNR) but dramatically drops perceptual scores (LPIPS and NRQM). Both versions of our method achieve better improvements than ND for all reported metrics. For each domain, the Static version improves at least one of the perceptual metrics; the EMA version improves all.
  • Figure 5: Visualization of low-level features for predictions of DIV2K Agustsson_2017_CVPR_Workshops (bicubic) and NTIRE20 (unknown) following liu2021discovering (a) Generalist's predictions for the two domains has overlapped distribution. (b) The predictions from the specialist that only adept in bicubic interpolation have separated distributions. (c) After applying our method (static version), predictions for two domains are pushed close.
  • ...and 4 more figures