Table of Contents
Fetching ...

Deep Unfolding Real-Time Super-Resolution Using Subpixel-Shift Twin Image and Convex Self-Similarity Prior

Chia-Hsiang Lin, Wei-Chih Liu, Yu-En Chiu, Jhao-Ting Lin

TL;DR

This work formulate the less investigated TISR using a convex criterion, which is implemented using a novel deep unfolding network, and proposes the proposed convex self-similarity unfolding supermode super-resolution (COSUP) algorithm, which achieves state-of-the-art performance with very fast millisecond-level computational time.

Abstract

Multi-image super-resolution (MISR) is a critical technique for satellite remote sensing. In the perspective of information, twin-image super-resolution (TISR) is regarded as the most challenging MISR scenario, having crucial applications like the SPOT-5 supermode imaging. In TISR, an image is super-resolved by its subpixel-shift counterpart (i.e., twin image), where the two images are typically offset by half a pixel both horizontally and vertically. We formulate the less investigated TISR using a convex criterion, which is implemented using a novel deep unfolding network. In the unfolding, an embedded simple shift operator trickily addresses the coupled TISR data-fitting terms, and a transformer trained with a convex self-similarity loss function elegantly implements the proximal mapping induced by the TISR regularizer. The proposed convex self-similarity unfolding supermode super-resolution (COSUP) algorithm is interpretable and achieves state-of-the-art performance with very fast millisecond-level computational time. COSUP is also tested on real-world data, for which the subpixel shifts would not be spatially uniform, with results showing great superiority over the official CNES supermode imaging product in terms of credible metrics (e.g., natural image quality evaluator, NIQE). Source codes: https://github.com/IHCLab/COSUP.

Deep Unfolding Real-Time Super-Resolution Using Subpixel-Shift Twin Image and Convex Self-Similarity Prior

TL;DR

This work formulate the less investigated TISR using a convex criterion, which is implemented using a novel deep unfolding network, and proposes the proposed convex self-similarity unfolding supermode super-resolution (COSUP) algorithm, which achieves state-of-the-art performance with very fast millisecond-level computational time.

Abstract

Multi-image super-resolution (MISR) is a critical technique for satellite remote sensing. In the perspective of information, twin-image super-resolution (TISR) is regarded as the most challenging MISR scenario, having crucial applications like the SPOT-5 supermode imaging. In TISR, an image is super-resolved by its subpixel-shift counterpart (i.e., twin image), where the two images are typically offset by half a pixel both horizontally and vertically. We formulate the less investigated TISR using a convex criterion, which is implemented using a novel deep unfolding network. In the unfolding, an embedded simple shift operator trickily addresses the coupled TISR data-fitting terms, and a transformer trained with a convex self-similarity loss function elegantly implements the proximal mapping induced by the TISR regularizer. The proposed convex self-similarity unfolding supermode super-resolution (COSUP) algorithm is interpretable and achieves state-of-the-art performance with very fast millisecond-level computational time. COSUP is also tested on real-world data, for which the subpixel shifts would not be spatially uniform, with results showing great superiority over the official CNES supermode imaging product in terms of credible metrics (e.g., natural image quality evaluator, NIQE). Source codes: https://github.com/IHCLab/COSUP.
Paper Structure (13 sections, 15 equations, 9 figures, 5 tables)

This paper contains 13 sections, 15 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Graphical illustration of the considered twin-image super-resolution (TISR) problem. The unknown target image ${\bm z}$ is the HR image (gray one). The aim is to super-resolve the LR image ${\bm y}_1$ (red one) to obtain the target HR image ${\bm z}$, where the ground sampling distance (GSD) of the LR image is twice that of the target image. This TISR will be achieved through the help of the subpixel-shift twin image ${\bm y}_2$ (green one). The green and red images are typically offset by half a pixel both horizontally and vertically.
  • Figure 2: Deep unfolding architectures of the proposed COSUP convex algorithm (cf. Section \ref{['sec:algodesign']}). For the intermediate stages $k=2,\dots,K-1$, the architecture is deployed based on the three ADMM closed-form solutions (cf. Section \ref{['sec:algodesign']}) for updating the primal variables $({\bm z},{\bm x})$ and the dual variable ${\bm d}$. Accordingly, the intermediate architecture $k$ is augmented to incorporate the initialization for Stage 1, and simplified to compute only the TISR result ${\bm z}$ for Stage $K$. Model 1 accounts for the Transformer-driven proximal operator for the primal update of ${\bm z}$. Model 2 derived from the lightweight-driven Woodbury matrix identity accounts for the primal update of ${\bm x}$. As for the dual update of ${\bm d}$, it is already inherently implemented by the deployed network architectures.
  • Figure 3: Model 1 corresponds to the primal update of ${\bm z}$, and is designed based on the physical meaning of the proximal operator. The Swin-Transformer, together with the embedded W-MSA and SW-MSA modules, are further illustrated in Figure \ref{['fig:swinT']}. Model 2 then implements the primal update of ${\bm x}$, and is designed based on the Woodbury matrix inversion lemma CVXbookCLL2016. The symmetric fully connected (SFC) layer contains a learnable symmetric matrix $\bm\Phi$. Both models serve as the elementary blocks in the overall deep unfolding network of COSUP algorithm, as depicted in Figure \ref{['fig:stage_k']}.
  • Figure 4: Detailed architectures of the Swin-T and the ResBlock, and the graphical illustration of the window-based multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA). The joint adoption of W-MSA and SW-MSA facilitates the learning of both local features and cross-window dependencies.
  • Figure 5: Visualized SR results across four representative scenes, including (a) farm, (b) city, (c) mountain, and (d) coastline. The HR image denotes the ground-truth ${\bm z}$, and the LR image is the input ${\bm y}_1$, as depicted in Figure \ref{['fig:TISRillustration']}. The numbers below each subimage denote the PSNR/SSIM values.
  • ...and 4 more figures