Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

Zhiheng Li; Muheng Li; Jixuan Fan; Lei Chen; Yansong Tang; Jiwen Lu; Jie Zhou

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

Zhiheng Li, Muheng Li, Jixuan Fan, Lei Chen, Yansong Tang, Jiwen Lu, Jie Zhou

TL;DR

A Dual-level Deformable Implicit Representation (DDIR) is proposed to solve real-world scale arbitrary super-resolution and achieves state-of-the-art performance on the RealArbiSR and RealSR benchmarks for real-world scale arbitrary super-resolution.

Abstract

Scale arbitrary super-resolution based on implicit image function gains increasing popularity since it can better represent the visual world in a continuous manner. However, existing scale arbitrary works are trained and evaluated on simulated datasets, where low-resolution images are generated from their ground truths by the simplest bicubic downsampling. These models exhibit limited generalization to real-world scenarios due to the greater complexity of real-world degradations. To address this issue, we build a RealArbiSR dataset, a new real-world super-resolution benchmark with both integer and non-integer scaling factors fo the training and evaluation of real-world scale arbitrary super-resolution. Moreover, we propose a Dual-level Deformable Implicit Representation (DDIR) to solve real-world scale arbitrary super-resolution. Specifically, we design the appearance embedding and deformation field to handle both image-level and pixel-level deformations caused by real-world degradations. The appearance embedding models the characteristics of low-resolution inputs to deal with photometric variations at different scales, and the pixel-based deformation field learns RGB differences which result from the deviations between the real-world and simulated degradations at arbitrary coordinates. Extensive experiments show our trained model achieves state-of-the-art performance on the RealArbiSR and RealSR benchmarks for real-world scale arbitrary super-resolution. The dataset and code are available at \url{https://github.com/nonozhizhiovo/RealArbiSR}.

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

TL;DR

Abstract

Paper Structure (24 sections, 4 equations, 10 figures, 13 tables)

This paper contains 24 sections, 4 equations, 10 figures, 13 tables.

Introduction
Related Work
The RealArbiSR Dataset
Camera Calibration
Dataset Collection
Methods
Analysis of Real-World Scale Arbitrary SR
Overview
Appearance Embedding
Deformation Field
Network Architecture and Training
Experiment
Experiment Setup
Comparisons with State-of-the-Art
Analysis of Scale Factors in RealArbiSR Dataset
...and 9 more sections

Figures (10)

Figure 1: (a) We propose a Dual-level Deformable Implicit Representation (DDIR) to solve real-world scale arbitrary SR, simulating the continuous optical zoom of a DSLR camera by only one model. We compare (b) the HR image with the SR results ($\times 3.7$) of a real-world LR image generated by (c) CiaoSRciaosr trained on DIV2K dataset with bicubic degradation (CiaoSR+BD), (d) CiaoSRciaosr trained on RealArbiSR dataset with real-world degradation (CiaoSR+RealArbiSR), and (e) our DDIR model trained on RealArbiSR dataset with real-world degradation (DDIR+RealArbiSR).
Figure 2: (a) The ground-truth images; (b) The images computed by subtracting the ground truths with their synthetic low-resolution versions which have been bicubically upscaled to the same resolution as the ground truth; (c) The images computed by subtracting the ground truths with their real-world low-resolution versions (bicubically upscaled); (d) The comparison of colour histograms between the ground truths and their real-world low-resolution versions (bicubically upscaled).
Figure 3: The training pipeline of our DDIR model. It consists of double branches, which are the deformation branch and the SR branch. Each branch is composed of an encoder and an MLP, taking the LR image and query coordinates as the inputs. The appearance embedding $l_a$ is computed as the spatial average pooling of the 2D feature map from the encoder $E_{\phi}^{sr}$ of the SR branch, which is fed into the decoding function $f_{\theta'}^{d}$ of the deformation branch by concatenation. The RGB output of the deformation branch is supervised by the deformation field. Then, the predicted deformation field feeds into the decoding function $f_{\theta}^{sr}$ of the SR branch by concatenation. Finally, the decoding function $f_{\theta}^{sr}$ of the SR branch outputs the target high-resolution RGB values at the query coordinates. Combining the appearance embedding and the deformation field, our DDIR model learns the dual-level deformable implicit representation to address the deformations at the image and pixel levels simultaneously.
Figure 4: Qualitative comparisons between different methods on benchmarks. Zoom in to have better views.
Figure 5: The checkerboards for the calibration of the focal lengths with the scale factors of (a) $\times$1.5, $\times$2.0, $\times$2.5, $\times$3.0, $\times$3.5, $\times$4.0; and (b) $\times$1.7, $\times$2.3, $\times$2.7, $\times$3.3, $\times$3.7.
...and 5 more figures

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

TL;DR

Abstract

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (10)