Table of Contents
Fetching ...

Towards Lightweight Super-Resolution with Dual Regression Learning

Yong Guo, Mingkui Tan, Zeshuai Deng, Jingdong Wang, Qi Chen, Jiezhang Cao, Yanwu Xu, Jian Chen

TL;DR

This work addresses the ill-posed nature of image super-resolution by introducing dual regression learning, which couples a primal LR→HR regression with a dual HR→LR reconstruction to constrain the mapping space and improve generalization. It defines a primal mapping $P: \mathcal{X} \to \mathcal{Y}$ and a dual downsampling mapping $D: \mathcal{Y} \to \mathcal{X}$ with the objective ${\mathcal{L}}_{DR}(P,D) = {\mathcal{L}}_{P}(P(x),y) + \lambda {\mathcal{L}}_{D}(D(P(x)),x)$, yielding a smaller generalization bound ${\mathcal{B}}(P,D) \le {\mathcal{B}}(P)$. To obtain lightweight SR models, it introduces Dual Regression Compression (DRC), a two-stage pipeline: (i) channel-number search guided by ${\mathcal{L}}_{DR}$ to identify layerwise redundancy, and (ii) channel pruning driven by a joint ${\mathcal{L}}_{M}$ and ${\mathcal{L}}_{DR}$ objective with an ${\ell}_{0}$ constraint. The framework is validated on CNN- and transformer-based SR architectures, achieving state-of-the-art accuracy with substantial reductions in parameters and FLOPs under both non-blind and blind degradation settings.

Abstract

Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks by learning a mapping from low-resolution (LR) images to high-resolution (HR) images. However, the SR problem is typically an ill-posed problem and existing methods would come with several limitations. First, the possible mapping space of SR can be extremely large since there may exist many different HR images that can be super-resolved from the same LR image. As a result, it is hard to directly learn a promising SR mapping from such a large space. Second, it is often inevitable to develop very large models with extremely high computational cost to yield promising SR performance. In practice, one can use model compression techniques to obtain compact models by reducing model redundancy. Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space. To alleviate the first challenge, we propose a dual regression learning scheme to reduce the space of possible SR mappings. Specifically, in addition to the mapping from LR to HR images, we learn an additional dual regression mapping to estimate the downsampling kernel and reconstruct LR images. In this way, the dual mapping acts as a constraint to reduce the space of possible mappings. To address the second challenge, we propose a dual regression compression (DRC) method to reduce model redundancy in both layer-level and channel-level based on channel pruning. Specifically, we first develop a channel number search method that minimizes the dual regression loss to determine the redundancy of each layer. Given the searched channel numbers, we further exploit the dual regression manner to evaluate the importance of channels and prune the redundant ones. Extensive experiments show the effectiveness of our method in obtaining accurate and efficient SR models.

Towards Lightweight Super-Resolution with Dual Regression Learning

TL;DR

This work addresses the ill-posed nature of image super-resolution by introducing dual regression learning, which couples a primal LR→HR regression with a dual HR→LR reconstruction to constrain the mapping space and improve generalization. It defines a primal mapping and a dual downsampling mapping with the objective , yielding a smaller generalization bound . To obtain lightweight SR models, it introduces Dual Regression Compression (DRC), a two-stage pipeline: (i) channel-number search guided by to identify layerwise redundancy, and (ii) channel pruning driven by a joint and objective with an constraint. The framework is validated on CNN- and transformer-based SR architectures, achieving state-of-the-art accuracy with substantial reductions in parameters and FLOPs under both non-blind and blind degradation settings.

Abstract

Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks by learning a mapping from low-resolution (LR) images to high-resolution (HR) images. However, the SR problem is typically an ill-posed problem and existing methods would come with several limitations. First, the possible mapping space of SR can be extremely large since there may exist many different HR images that can be super-resolved from the same LR image. As a result, it is hard to directly learn a promising SR mapping from such a large space. Second, it is often inevitable to develop very large models with extremely high computational cost to yield promising SR performance. In practice, one can use model compression techniques to obtain compact models by reducing model redundancy. Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space. To alleviate the first challenge, we propose a dual regression learning scheme to reduce the space of possible SR mappings. Specifically, in addition to the mapping from LR to HR images, we learn an additional dual regression mapping to estimate the downsampling kernel and reconstruct LR images. In this way, the dual mapping acts as a constraint to reduce the space of possible mappings. To address the second challenge, we propose a dual regression compression (DRC) method to reduce model redundancy in both layer-level and channel-level based on channel pruning. Specifically, we first develop a channel number search method that minimizes the dual regression loss to determine the redundancy of each layer. Given the searched channel numbers, we further exploit the dual regression manner to evaluate the importance of channels and prune the redundant ones. Extensive experiments show the effectiveness of our method in obtaining accurate and efficient SR models.
Paper Structure (25 sections, 1 theorem, 6 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 6 equations, 7 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Let ${\mathcal{L}}_{\rm DR}(P,D)$ be a mapping from ${\mathcal{X}} {\times} {\mathcal{Y}}$ to $[0, 1]$ and ${\mathcal{H}}_{dual}$ be the function space. Let $N$ denote the number of samples and $\hat{R}_{{\mathcal{Z}}}^{DL}$ represent the empirical Rademacher complexitymohri2012foundations of dual l

Figures (7)

  • Figure 1: The proposed dual regression learning scheme contains a primal regression task for SR and a dual regression task to reconstruct LR images. The primal and dual regression tasks form a closed-loop.
  • Figure 2: Overview of the dual regression compression (DRC) approach. Given a target compression ratio $r$, we first determine the redundancy of each layer by performing the dual regression based channel number search. Then, according to the searched channel numbers, we evaluate the importance of channels and prune those redundant ones to obtain the compressed model $\widehat{P}$.
  • Figure 3: The dual regression based channel pruning method. We evaluate the importance of channels by computing both the feature reconstruction loss ${\mathcal{L}}_{\rm M}$ and the dual regression loss ${\mathcal{L}}_{\rm DR}$. Here, ${\bf X}^{(l+1)}$ and $\widehat{{\bf X}}^{(l+1)}$ denote the output features of the $l$-th layer in the original model and the pruned model, respectively. $c_l$ and $\hat{c}_l$ denote the channel number of the $l$-th layer in the original model and the pruned model. The red dashed box represents our proposed dual regression loss ${\mathcal{L}}_{\rm DR}$.
  • Figure 4: Overview of most channel number search strategies and our importance-aware search strategy. (a) Some previous works peng2019efficientfang2020densely assume that different channel configurations should be treated individually. For two candidate numbers of channels $k_1$ and $k_2$ ($k_1 < k_2$), the selected $k_2$ channels are independent of the $k_1$ channels. (b) Some other works guo2020singlewan2020fbnetv2wang2020revisiting use the weight-sharing strategy to reduce the search cost. For two candidate numbers of channels $k_1$ and $k_2$ ($k_1 < k_2$), the selected $k_2$ channels contain all the $k_1$ channels. The weights of these overlapped channels are shared across different sets of channels during searching. (c) Based on the weight-sharing strategy, we further propose an importance-aware search strategy to search for a promising/suitable channel configuration to recognize and reduce layer-wise redundancy. For each candidate number of channels $k$, we select top-$k$ important channels and ignore the rest of the redundant channels simultaneously. Note that we keep the position of the selected $k$ channels on the original model unchanged, avoiding the effect of the ranking operation on the output features.
  • Figure 5: Visual comparisons of the images produced by different models for $4\times$ image super-resolution on benchmark datasets. We show that all the models enhanced by our DR consistently produce sharper images, i.e., with more high-frequency information, than their original counterparts.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Theorem 1