Table of Contents
Fetching ...

GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction

Zaid Ilyas, Naveed Akhtar, David Suter, Syed Zulqarnain Gilani

TL;DR

An instance-Guided Low-rank Multi-Head Multi-Head selfattention (GLMHA) is proposed to replace the CSA for a considerable computational gain while closely retaining the original model performance.

Abstract

Image restoration and spectral reconstruction are longstanding computer vision tasks. Currently, CNN-transformer hybrid models provide state-of-the-art performance for these tasks. The key common ingredient in the architectural designs of these models is Channel-wise Self-Attention (CSA). We first show that CSA is an overall low-rank operation. Then, we propose an instance-Guided Low-rank Multi-Head selfattention (GLMHA) to replace the CSA for a considerable computational gain while closely retaining the original model performance. Unique to the proposed GLMHA is its ability to provide computational gain for both short and long input sequences. In particular, the gain is in terms of both Floating Point Operations (FLOPs) and parameter count reduction. This is in contrast to the existing popular computational complexity reduction techniques, e.g., Linformer, Performer, and Reformer, for whom FLOPs overpower the efficient design tricks for the shorter input sequences. Moreover, parameter reduction remains unaccounted for in the existing methods.We perform an extensive evaluation for the tasks of spectral reconstruction from RGB images, spectral reconstruction from snapshot compressive imaging, motion deblurring, and image deraining by enhancing the best-performing models with our GLMHA. Our results show up to a 7.7 Giga FLOPs reduction with 370K fewer parameters required to closely retain the original performance of the best-performing models that employ CSA.

GLMHA A Guided Low-rank Multi-Head Self-Attention for Efficient Image Restoration and Spectral Reconstruction

TL;DR

An instance-Guided Low-rank Multi-Head Multi-Head selfattention (GLMHA) is proposed to replace the CSA for a considerable computational gain while closely retaining the original model performance.

Abstract

Image restoration and spectral reconstruction are longstanding computer vision tasks. Currently, CNN-transformer hybrid models provide state-of-the-art performance for these tasks. The key common ingredient in the architectural designs of these models is Channel-wise Self-Attention (CSA). We first show that CSA is an overall low-rank operation. Then, we propose an instance-Guided Low-rank Multi-Head selfattention (GLMHA) to replace the CSA for a considerable computational gain while closely retaining the original model performance. Unique to the proposed GLMHA is its ability to provide computational gain for both short and long input sequences. In particular, the gain is in terms of both Floating Point Operations (FLOPs) and parameter count reduction. This is in contrast to the existing popular computational complexity reduction techniques, e.g., Linformer, Performer, and Reformer, for whom FLOPs overpower the efficient design tricks for the shorter input sequences. Moreover, parameter reduction remains unaccounted for in the existing methods.We perform an extensive evaluation for the tasks of spectral reconstruction from RGB images, spectral reconstruction from snapshot compressive imaging, motion deblurring, and image deraining by enhancing the best-performing models with our GLMHA. Our results show up to a 7.7 Giga FLOPs reduction with 370K fewer parameters required to closely retain the original performance of the best-performing models that employ CSA.
Paper Structure (8 sections, 8 equations, 7 figures, 9 tables)

This paper contains 8 sections, 8 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Low-rank nature of self-attention. Average cumulative distribution function of Eigenvalues for self-attention of Restormer r1 (a-d), MST-L r42 (e-g), and MST++ r43 (h-j). For Restormer, the plot averages image deraining, denoising, and motion deblurring performance. For MST-L, we perform real and simulated data spectral reconstruction. For MST++, NTIRE 2022 challenge dataset for spectral reconstruction is used. Further details with more supporting results are also provided in supplementary material.
  • Figure 2: (a) Input Feature Map. (b) The conventional self-attention approach operates on samples or patches in HW domain. (c) Channel Self-Attention (CSA) operates on channels C, treating each channel as a sample in the sequence and HW as its embedding.
  • Figure 3: (a) Pipeline of the proposed GLMHA. (b) Calibration Network
  • Figure 4: (a) Conventional self-attention (b) Linformer low-rank approximation-based self-attention. The generated Keys and Values are projected to low-rank approximations and then self-attention is calculated. (c) Proposed GLMHA that generates low-rank Keys and Values embeddings from input feature map in an instance-based learnable way.
  • Figure 5: Representative qualitative results of HSI reconstruction. Four out of 31 generated spectral images are shown for each model. Best viewed enlarged.
  • ...and 2 more figures