Table of Contents
Fetching ...

Multi-level Attention-guided Graph Neural Network for Image Restoration

Jiatao Jiang, Zhen Cui, Chunyan Xu, Jian Yang

TL;DR

The paper tackles image restoration by addressing the gap in leveraging global feature maps alongside local structure. It introduces MAGN, a multi-level attention-guided graph network that couples a global representation graph with a local structure graph via a multi-head attention mechanism to dynamically learn graph adjacencies. The method demonstrates state-of-the-art performance across denoising, compression artifact reduction, and demosaicing on standard benchmarks while maintaining a compact parameter count. This approach provides an end-to-end, graph-based enhancement to CNNs, enabling robust restoration through explicit global-local information fusion with efficient computation.

Abstract

In recent years, deep learning has achieved remarkable success in the field of image restoration. However, most convolutional neural network-based methods typically focus on a single scale, neglecting the incorporation of multi-scale information. In image restoration tasks, local features of an image are often insufficient, necessitating the integration of global features to complement them. Although recent neural network algorithms have made significant strides in feature extraction, many models do not explicitly model global features or consider the relationship between global and local features. This paper proposes multi-level attention-guided graph neural network. The proposed network explicitly constructs element block graphs and element graphs within feature maps using multi-attention mechanisms to extract both local structural features and global representation information of the image. Since the network struggles to effectively extract global information during image degradation, the structural information of local feature blocks can be used to correct and supplement the global information. Similarly, when element block information in the feature map is missing, it can be refined using global element representation information. The graph within the network learns real-time dynamic connections through the multi-attention mechanism, and information is propagated and aggregated via graph convolution algorithms. By combining local element block information and global element representation information from the feature map, the algorithm can more effectively restore missing information in the image. Experimental results on several classic image restoration tasks demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance.

Multi-level Attention-guided Graph Neural Network for Image Restoration

TL;DR

The paper tackles image restoration by addressing the gap in leveraging global feature maps alongside local structure. It introduces MAGN, a multi-level attention-guided graph network that couples a global representation graph with a local structure graph via a multi-head attention mechanism to dynamically learn graph adjacencies. The method demonstrates state-of-the-art performance across denoising, compression artifact reduction, and demosaicing on standard benchmarks while maintaining a compact parameter count. This approach provides an end-to-end, graph-based enhancement to CNNs, enabling robust restoration through explicit global-local information fusion with efficient computation.

Abstract

In recent years, deep learning has achieved remarkable success in the field of image restoration. However, most convolutional neural network-based methods typically focus on a single scale, neglecting the incorporation of multi-scale information. In image restoration tasks, local features of an image are often insufficient, necessitating the integration of global features to complement them. Although recent neural network algorithms have made significant strides in feature extraction, many models do not explicitly model global features or consider the relationship between global and local features. This paper proposes multi-level attention-guided graph neural network. The proposed network explicitly constructs element block graphs and element graphs within feature maps using multi-attention mechanisms to extract both local structural features and global representation information of the image. Since the network struggles to effectively extract global information during image degradation, the structural information of local feature blocks can be used to correct and supplement the global information. Similarly, when element block information in the feature map is missing, it can be refined using global element representation information. The graph within the network learns real-time dynamic connections through the multi-attention mechanism, and information is propagated and aggregated via graph convolution algorithms. By combining local element block information and global element representation information from the feature map, the algorithm can more effectively restore missing information in the image. Experimental results on several classic image restoration tasks demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance.

Paper Structure

This paper contains 21 sections, 14 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: The diagram illustrates the graph structure constructed between pixel patches in an image. The four red boxes highlight local patches of a building, which exhibit highly similar representation and structural information. Similarly, the four blue boxes highlight local patches of grass, which also share analogous features. For both the building and grass patches, a connected graph can be constructed to propagate and aggregate local structural information and global representation information across the image. This process supplements and refines the missing or incomplete details in each patch by leveraging the shared characteristics of similar regions. Through this graph-based approach, the network effectively integrates local and global information, enhancing the overall restoration and representation of the image.
  • Figure 2: The framework of our proposed model. The network contains two module: i) residual block module; ii) graph-based block module. The residual block module stacks some residual blocks to learn the residual information. The graph-based block module mainly employs graph convolution with multi-headed attention to learn the interation information between the pixel and patch on images. The framework learns a residual of the original image to restore the high quality image.
  • Figure 3: The comparison of the two basic modules. The residual module employs two convolution layers to learn residuals of the input and stacks $n_1$ basic residual blocks. The graph-based module employs two graph convolution layers to learn residuals of the input and has $n_2$ graph-based blocks. The graph-based block propagates interactive information on images with $N$ multi-headed attention.
  • Figure 4: Graph generator. The generator mainly employ the attention mechanism to construct a adjacency matrix of inputs. The feature map is projected to representation space $q, k$ by the representation function. Then, similarity matrix $\mathbf{S}$ is calculated by the representation feature $q, k$. What's more, the mask matrix avoid the disturbance of low similarity nodes. Finally, the adjacency matrix $\mathbf{A}$ contains the interaction relations between the input data.
  • Figure 5: Information aggregation on pixels and patches. The bottom half is information aggregation on pixels. The graph generator constructs an adjacency matrix $\mathbf{A}$ with the attention mechanism on pixels. More details can be found in section \ref{['sec:graphconvolution']}. The graph convolution operation extracts the global information and updates the feature map $\mathbf{X}_1$ on the pixel graph $\mathcal{G}=(\mathbf{X}, \mathbf{A})$. The top half is information aggregation on patches. The feature map is unfolded into patches $p_1 \in \mathbb{R}^{H_p\times W_p\times C\times L}$. Then, the adjacent relations $\mathbf{A}_p$ are modeled between $L$ patches. More details can be found in section \ref{['sec:graphconvolution']}. The $L$ patches are folded into a feature map after the graph convolution operation $g_p(p_3, \mathbf{A}_p)$. Finally, two parts of the feature map are joined together as the residuals of the input. The output is the sum of the original feature map and the residuals on pixels and patches.
  • ...and 9 more figures