Table of Contents
Fetching ...

Key-Graph Transformer for Image Restoration

Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

TL;DR

IR requires global contextual information, but standard ViT-based approaches incur high computational cost due to dense self-attention at high resolutions. The paper introduces the Key-Graph Transformer (KGT), which constructs a sparse, representative Key-Graph per stage via a KNN-based Key-Graph Constructor and applies a Key-Graph Attention over a selected set of neighbors, reducing complexity from $O((HW)^2)$ to $O(HW \cdot k)$ while preserving essential non-local cues. It further shares the Key-Graph across all KGT layers within a stage and provides multiple implementation and training strategies, supported by extensive ablations. Across six IR tasks, KGT achieves state-of-the-art results with notable efficiency, demonstrating effective handling of irregular image content and the ability to generalize to multiple degradation levels; code will be released for reproducibility and broader use.

Abstract

While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In response to these challenges, we introduce the Key-Graph Transformer (KGT) in this paper. Specifically, KGT views patch features as graph nodes. The proposed Key-Graph Constructor efficiently forms a sparse yet representative Key-Graph by selectively connecting essential nodes instead of all the nodes. Then the proposed Key-Graph Attention is conducted under the guidance of the Key-Graph only among selected nodes with linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed KGT's state-of-the-art performance, showcasing advancements both quantitatively and qualitatively.

Key-Graph Transformer for Image Restoration

TL;DR

IR requires global contextual information, but standard ViT-based approaches incur high computational cost due to dense self-attention at high resolutions. The paper introduces the Key-Graph Transformer (KGT), which constructs a sparse, representative Key-Graph per stage via a KNN-based Key-Graph Constructor and applies a Key-Graph Attention over a selected set of neighbors, reducing complexity from to while preserving essential non-local cues. It further shares the Key-Graph across all KGT layers within a stage and provides multiple implementation and training strategies, supported by extensive ablations. Across six IR tasks, KGT achieves state-of-the-art results with notable efficiency, demonstrating effective handling of irregular image content and the ability to generalize to multiple degradation levels; code will be released for reproducibility and broader use.

Abstract

While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution. Furthermore, the self-attention mechanism in transformers is prone to considering unnecessary global cues from unrelated objects or regions, introducing computational inefficiencies. In response to these challenges, we introduce the Key-Graph Transformer (KGT) in this paper. Specifically, KGT views patch features as graph nodes. The proposed Key-Graph Constructor efficiently forms a sparse yet representative Key-Graph by selectively connecting essential nodes instead of all the nodes. Then the proposed Key-Graph Attention is conducted under the guidance of the Key-Graph only among selected nodes with linear computational complexity within each window. Extensive experiments across 6 IR tasks confirm the proposed KGT's state-of-the-art performance, showcasing advancements both quantitatively and qualitatively.
Paper Structure (11 sections, 6 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 11 sections, 6 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) The CNN filter captures information only within a local region. (b) The standard MLP/Transformer architectures take full input in a long sequence manner. (c) The window-size multi-head self-attention (MSA) mechanism builds a fully connected dense graph within each window. (d) Position-fixed sparse graph. (e) The proposed Key-Graph connects only the essential nodes.
  • Figure 2: The proposed KGT mainly consists of a convolutional feature extractor, the main body of the proposed KGT for representation learning, and an image reconstructor. The main body shown here is for SR, while the U-shaped structure (Shown in Appx.) is used for other IR tasks. (b) The illustration of the Key-Graph Transformer layer within each KGT stage.
  • Figure 3: The toy example of $k$=3 for the illustration of Key-Graph Constructor (a) and the Key-Graph attention (b) within each KGT Layer.
  • Figure 4: Ablation study on the impact of $k$. The size of the circle denotes the FLOPs. The $k$ on the horizontal axis is the one used during inference.
  • Figure 5: One model is trained to handle multiple degradation levels for denoising (a-b) and JPEG CAR (c-d).
  • ...and 1 more figures