Table of Contents
Fetching ...

DRCT: Saving Image Super-resolution away from Information Bottleneck

Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou

TL;DR

The Dense-residual-connected Transformer (DRCT) is proposed, aimed at mitigating the loss of spatial information and stabilizing the information flow through dense-residual connections between layers, thereby unleashing the model’s potential and saving the model away from information bottleneck.

Abstract

In recent years, Vision Transformer-based approaches for low-level vision tasks have achieved widespread success. Unlike CNN-based models, Transformers are more adept at capturing long-range dependencies, enabling the reconstruction of images utilizing non-local information. In the domain of super-resolution, Swin-transformer-based models have become mainstream due to their capability of global spatial information modeling and their shifting-window attention mechanism that facilitates the interchange of information between different windows. Many researchers have enhanced model performance by expanding the receptive fields or designing meticulous networks, yielding commendable results. However, we observed that it is a general phenomenon for the feature map intensity to be abruptly suppressed to small values towards the network's end. This implies an information bottleneck and a diminishment of spatial information, implicitly limiting the model's potential. To address this, we propose the Dense-residual-connected Transformer (DRCT), aimed at mitigating the loss of spatial information and stabilizing the information flow through dense-residual connections between layers, thereby unleashing the model's potential and saving the model away from information bottleneck. Experiment results indicate that our approach surpasses state-of-the-art methods on benchmark datasets and performs commendably at the NTIRE-2024 Image Super-Resolution (x4) Challenge. Our source code is available at https://github.com/ming053l/DRCT

DRCT: Saving Image Super-resolution away from Information Bottleneck

TL;DR

The Dense-residual-connected Transformer (DRCT) is proposed, aimed at mitigating the loss of spatial information and stabilizing the information flow through dense-residual connections between layers, thereby unleashing the model’s potential and saving the model away from information bottleneck.

Abstract

In recent years, Vision Transformer-based approaches for low-level vision tasks have achieved widespread success. Unlike CNN-based models, Transformers are more adept at capturing long-range dependencies, enabling the reconstruction of images utilizing non-local information. In the domain of super-resolution, Swin-transformer-based models have become mainstream due to their capability of global spatial information modeling and their shifting-window attention mechanism that facilitates the interchange of information between different windows. Many researchers have enhanced model performance by expanding the receptive fields or designing meticulous networks, yielding commendable results. However, we observed that it is a general phenomenon for the feature map intensity to be abruptly suppressed to small values towards the network's end. This implies an information bottleneck and a diminishment of spatial information, implicitly limiting the model's potential. To address this, we propose the Dense-residual-connected Transformer (DRCT), aimed at mitigating the loss of spatial information and stabilizing the information flow through dense-residual connections between layers, thereby unleashing the model's potential and saving the model away from information bottleneck. Experiment results indicate that our approach surpasses state-of-the-art methods on benchmark datasets and performs commendably at the NTIRE-2024 Image Super-Resolution (x4) Challenge. Our source code is available at https://github.com/ming053l/DRCT
Paper Structure (20 sections, 12 equations, 6 figures, 3 tables)

This paper contains 20 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The feature map intensity on various benchmark datasets. We observed that feature map intensities decrease sharply at the end of SISR network, indicating potential information loss. In this paper, we propose DRCT to address this issue by enhancing receptive fields and adding dense-connections within residual blocks to mitigate information bottlenecks, thereby improving performance with a simpler model design.
  • Figure 2: The feature map visualization displays, from top to bottom, SwinIR SwinIR, HAT HAT, and the proposed DRCT, with positions further to the right representing deeper layers within the network. For both SwinIR and HAT, the intensity of the feature maps is significant in the shallower layers but diminishes towards the network's end. We consider this phenomenon implies the loss of spatial information, leading to the limitation and information bottleneck with SISR tasks. As for the proposed DRCT, the learned feature maps are gradually and stably enhanced without obvious oscillations. It represents the stability of the information flow during forward propagation, thereby yielding higher intensity in the final layer's output. (zoom in to better observe the color-bar besides feature maps.)
  • Figure 3: The overall architecture of the proposed Dense-residual-connected Transformer (DRCT) and the structure of Residual-Dense Group (RDG). Each RDG contains five consecutive Swin-Dense-Residual-Connected Blocks (SDRCBs). By integrating dense-connection huang2018densely into SwinIR SwinIR, the efficiency can be improved for Saving Image Super-resolution away from Information Bottleneck.
  • Figure 4: Visual comparison on × 4 SISR. The patches for comparison are marked with red boxes in the original images. The higher the PSNR/SSIM metrics, the better the performance..
  • Figure 5: The LAM LAM visualization. DRCT improves performance by enhancing the receptive field to mitigate the issue of spatial information loss in deeper layers of the network.
  • ...and 1 more figures