Table of Contents
Fetching ...

Discovering Distinctive "Semantics" in Super-Resolution Networks

Yihao Liu, Anran Liu, Jinjin Gu, Zhipeng Zhang, Wenhao Wu, Yu Qiao, Chao Dong

TL;DR

This paper reveals that deep SR networks learn DDR, degradation-focused semantics that distinguish degradations rather than image content. Through dimensionality reduction and visualization across CinCGAN, SRResNet, and SRGAN, DDR is shown to be shaped by adversarial learning and global residual, and to evolve with training, correlating with improved SR performance. DDR enables practical tasks such as distortion identification, blind SR with degradation-embedded guidance, and generalization evaluation, offering a new lens for understanding and improving low-level vision models. The work has implications for interpretability, degradation disentanglement, and robust SR design in real-world settings.

Abstract

Image super-resolution (SR) is a representative low-level vision problem. Although deep SR networks have achieved extraordinary success, we are still unaware of their working mechanisms. Specifically, whether SR networks can learn semantic information, or just perform complex mapping function? What hinders SR networks from generalizing to real-world data? These questions not only raise our curiosity, but also influence SR network development. In this paper, we make the primary attempt to answer the above fundamental questions. After comprehensively analyzing the feature representations (via dimensionality reduction and visualization), we successfully discover the distinctive "semantics" in SR networks, i.e., deep degradation representations (DDR), which relate to image degradation instead of image content. We show that a well-trained deep SR network is naturally a good descriptor of degradation information. Our experiments also reveal two key factors (adversarial learning and global residual) that influence the extraction of such semantics. We further apply DDR in several interesting applications (such as distortion identification, blind SR and generalization evaluation) and achieve promising results, demonstrating the correctness and effectiveness of our findings.

Discovering Distinctive "Semantics" in Super-Resolution Networks

TL;DR

This paper reveals that deep SR networks learn DDR, degradation-focused semantics that distinguish degradations rather than image content. Through dimensionality reduction and visualization across CinCGAN, SRResNet, and SRGAN, DDR is shown to be shaped by adversarial learning and global residual, and to evolve with training, correlating with improved SR performance. DDR enables practical tasks such as distortion identification, blind SR with degradation-embedded guidance, and generalization evaluation, offering a new lens for understanding and improving low-level vision models. The work has implications for interpretability, degradation disentanglement, and robust SR design in real-world settings.

Abstract

Image super-resolution (SR) is a representative low-level vision problem. Although deep SR networks have achieved extraordinary success, we are still unaware of their working mechanisms. Specifically, whether SR networks can learn semantic information, or just perform complex mapping function? What hinders SR networks from generalizing to real-world data? These questions not only raise our curiosity, but also influence SR network development. In this paper, we make the primary attempt to answer the above fundamental questions. After comprehensively analyzing the feature representations (via dimensionality reduction and visualization), we successfully discover the distinctive "semantics" in SR networks, i.e., deep degradation representations (DDR), which relate to image degradation instead of image content. We show that a well-trained deep SR network is naturally a good descriptor of degradation information. Our experiments also reveal two key factors (adversarial learning and global residual) that influence the extraction of such semantics. We further apply DDR in several interesting applications (such as distortion identification, blind SR and generalization evaluation) and achieve promising results, demonstrating the correctness and effectiveness of our findings.

Paper Structure

This paper contains 25 sections, 10 equations, 24 figures, 7 tables.

Figures (24)

  • Figure 1: Distributions of the deep representations of classification and super-resolution networks. For classification networks, the semantics of the deep feature representations are artificially predefined according to the training data (category labels). However, for SR networks, the learned deep representations have a different kind of "semantics" from classification. During training, the SR networks are only provided with downsampled clean LR images. There is not any supervision signal related to image degradation information. We surprisingly find that the deep representations of SR networks are spontaneously discriminative to different degradations. Notably, NOT an arbitrary SR network has such a property. In Sec. \ref{['sec:two_factors']}, we reveal two factors that facilitate SR networks to extract such degradation-related representations, i.e., adversarial learning and global residual.
  • Figure 2: Different degraded input images and their corresponding outputs produced by CinCGAN cincgan and BM3D bm3d. CinCGAN cincgan is trained on DIV2K-mild dataset in an unpaired manner. If the input image conforms to the training data distribution, CinCGAN will generate better restoration results than BM3D (a). Otherwise, it tends to ignore the unseen degradation types and keeps the input images almost untouched (b)&(c). On the other hand, the traditional method BM3D bm3d has stable performance and similar denoising effects on all input images, regardless of the input degradation types. Zoom in for best view.
  • Figure 3: (a)-(d): The projected deep feature representations. The deep features of CinCGAN and SRGAN are separated by degradation types, even if the image contents are aligned. (e)-a: ResNet18 resnet for classification. "Conv2_x" represents the 2nd group of residual blocks. (e)-b: SRResNet-woGR (without global residual). (e)-c: SRResNet (with global residual). "RB1" represents the 1st residual block. Please zoom in for best view.
  • Figure 4: Projected feature representations extracted from different layers of ResNet18 using t-SNE. With the network deepens, the representations become more discriminative to object categories, which clearly shows the semantics of the representations in classification.
  • Figure 5: Feature representation differences between classification and SR networks. The same object category is represented by the same color, and the same image degradation type is depicted by the same marker shape. For the classification network, feature representations are clustered by the same color, while representations of SR network are clustered by the same marker shape, suggesting that there is a significant difference in feature representations between classification and SR networks.
  • ...and 19 more figures