Towards Lightweight Super-Resolution with Dual Regression Learning

Yong Guo; Mingkui Tan; Zeshuai Deng; Jingdong Wang; Qi Chen; Jiezhang Cao; Yanwu Xu; Jian Chen

Towards Lightweight Super-Resolution with Dual Regression Learning

Yong Guo, Mingkui Tan, Zeshuai Deng, Jingdong Wang, Qi Chen, Jiezhang Cao, Yanwu Xu, Jian Chen

TL;DR

This work addresses the ill-posed nature of image super-resolution by introducing dual regression learning, which couples a primal LR→HR regression with a dual HR→LR reconstruction to constrain the mapping space and improve generalization. It defines a primal mapping $P: \mathcal{X} \to \mathcal{Y}$ and a dual downsampling mapping $D: \mathcal{Y} \to \mathcal{X}$ with the objective ${\mathcal{L}}_{DR}(P,D) = {\mathcal{L}}_{P}(P(x),y) + \lambda {\mathcal{L}}_{D}(D(P(x)),x)$, yielding a smaller generalization bound ${\mathcal{B}}(P,D) \le {\mathcal{B}}(P)$. To obtain lightweight SR models, it introduces Dual Regression Compression (DRC), a two-stage pipeline: (i) channel-number search guided by ${\mathcal{L}}_{DR}$ to identify layerwise redundancy, and (ii) channel pruning driven by a joint ${\mathcal{L}}_{M}$ and ${\mathcal{L}}_{DR}$ objective with an ${\ell}_{0}$ constraint. The framework is validated on CNN- and transformer-based SR architectures, achieving state-of-the-art accuracy with substantial reductions in parameters and FLOPs under both non-blind and blind degradation settings.

Abstract

Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks by learning a mapping from low-resolution (LR) images to high-resolution (HR) images. However, the SR problem is typically an ill-posed problem and existing methods would come with several limitations. First, the possible mapping space of SR can be extremely large since there may exist many different HR images that can be super-resolved from the same LR image. As a result, it is hard to directly learn a promising SR mapping from such a large space. Second, it is often inevitable to develop very large models with extremely high computational cost to yield promising SR performance. In practice, one can use model compression techniques to obtain compact models by reducing model redundancy. Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space. To alleviate the first challenge, we propose a dual regression learning scheme to reduce the space of possible SR mappings. Specifically, in addition to the mapping from LR to HR images, we learn an additional dual regression mapping to estimate the downsampling kernel and reconstruct LR images. In this way, the dual mapping acts as a constraint to reduce the space of possible mappings. To address the second challenge, we propose a dual regression compression (DRC) method to reduce model redundancy in both layer-level and channel-level based on channel pruning. Specifically, we first develop a channel number search method that minimizes the dual regression loss to determine the redundancy of each layer. Given the searched channel numbers, we further exploit the dual regression manner to evaluate the importance of channels and prune the redundant ones. Extensive experiments show the effectiveness of our method in obtaining accurate and efficient SR models.

Towards Lightweight Super-Resolution with Dual Regression Learning

TL;DR

and a dual downsampling mapping

with the objective

, yielding a smaller generalization bound

. To obtain lightweight SR models, it introduces Dual Regression Compression (DRC), a two-stage pipeline: (i) channel-number search guided by

to identify layerwise redundancy, and (ii) channel pruning driven by a joint

and

objective with an

constraint. The framework is validated on CNN- and transformer-based SR architectures, achieving state-of-the-art accuracy with substantial reductions in parameters and FLOPs under both non-blind and blind degradation settings.

Abstract

Paper Structure (25 sections, 1 theorem, 6 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 6 equations, 7 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Image Super-resolution
Lightweight Model Techniques
Dual Learning
Dual Regression Networks
Dual Regression Learning for Super-Resolution
Dual Regression Compression
Dual Regression based Channel Number Search
Dual Regression based Channel Pruning
Experiments
Datasets and Implementation Details
Comparisons with State-of-the-art SR Methods
Comparisons with Lightweight SR Models
Model Compression of Blind SR Models
...and 10 more sections

Key Result

Theorem 1

Let ${\mathcal{L}}_{\rm DR}(P,D)$ be a mapping from ${\mathcal{X}} {\times} {\mathcal{Y}}$ to $[0, 1]$ and ${\mathcal{H}}_{dual}$ be the function space. Let $N$ denote the number of samples and $\hat{R}_{{\mathcal{Z}}}^{DL}$ represent the empirical Rademacher complexitymohri2012foundations of dual l

Figures (7)

Figure 1: The proposed dual regression learning scheme contains a primal regression task for SR and a dual regression task to reconstruct LR images. The primal and dual regression tasks form a closed-loop.
Figure 2: Overview of the dual regression compression (DRC) approach. Given a target compression ratio $r$, we first determine the redundancy of each layer by performing the dual regression based channel number search. Then, according to the searched channel numbers, we evaluate the importance of channels and prune those redundant ones to obtain the compressed model $\widehat{P}$.
Figure 3: The dual regression based channel pruning method. We evaluate the importance of channels by computing both the feature reconstruction loss ${\mathcal{L}}_{\rm M}$ and the dual regression loss ${\mathcal{L}}_{\rm DR}$. Here, ${\bf X}^{(l+1)}$ and $\widehat{{\bf X}}^{(l+1)}$ denote the output features of the $l$-th layer in the original model and the pruned model, respectively. $c_l$ and $\hat{c}_l$ denote the channel number of the $l$-th layer in the original model and the pruned model. The red dashed box represents our proposed dual regression loss ${\mathcal{L}}_{\rm DR}$.
Figure 4: Overview of most channel number search strategies and our importance-aware search strategy. (a) Some previous works peng2019efficientfang2020densely assume that different channel configurations should be treated individually. For two candidate numbers of channels $k_1$ and $k_2$ ($k_1 < k_2$), the selected $k_2$ channels are independent of the $k_1$ channels. (b) Some other works guo2020singlewan2020fbnetv2wang2020revisiting use the weight-sharing strategy to reduce the search cost. For two candidate numbers of channels $k_1$ and $k_2$ ($k_1 < k_2$), the selected $k_2$ channels contain all the $k_1$ channels. The weights of these overlapped channels are shared across different sets of channels during searching. (c) Based on the weight-sharing strategy, we further propose an importance-aware search strategy to search for a promising/suitable channel configuration to recognize and reduce layer-wise redundancy. For each candidate number of channels $k$, we select top-$k$ important channels and ignore the rest of the redundant channels simultaneously. Note that we keep the position of the selected $k$ channels on the original model unchanged, avoiding the effect of the ranking operation on the output features.
Figure 5: Visual comparisons of the images produced by different models for $4\times$ image super-resolution on benchmark datasets. We show that all the models enhanced by our DR consistently produce sharper images, i.e., with more high-frequency information, than their original counterparts.
...and 2 more figures

Theorems & Definitions (3)

Definition 1
Definition 2
Theorem 1

Towards Lightweight Super-Resolution with Dual Regression Learning

TL;DR

Abstract

Towards Lightweight Super-Resolution with Dual Regression Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (3)