Table of Contents
Fetching ...

AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration

Hongyi Cai, Mohammad Mahdinur Rahman, Mohammad Shahid Akhtar, Jie Li, Jingyu Wu, Zhili Fang

TL;DR

AgileIR, group shifted attention mechanism along with window attention, which sparsely simplifies the model in architecture and keeps the performance still at 32.20 dB on Set5 evaluation dataset, exceeding other methods with tailor-made efficient methods and saves over 50% memory while a large batch size is employed.

Abstract

Image Transformers show a magnificent success in Image Restoration tasks. Nevertheless, most of transformer-based models are strictly bounded by exorbitant memory occupancy. Our goal is to reduce the memory consumption of Swin Transformer and at the same time speed up the model during training process. Thus, we introduce AgileIR, group shifted attention mechanism along with window attention, which sparsely simplifies the model in architecture. We propose Group Shifted Window Attention (GSWA) to decompose Shift Window Multi-head Self Attention (SW-MSA) and Window Multi-head Self Attention (W-MSA) into groups across their attention heads, contributing to shrinking memory usage in back propagation. In addition to that, we keep shifted window masking and its shifted learnable biases during training, in order to induce the model interacting across windows within the channel. We also re-allocate projection parameters to accelerate attention matrix calculation, which we found a negligible decrease in performance. As a result of experiment, compared with our baseline SwinIR and other efficient quantization models, AgileIR keeps the performance still at 32.20 dB on Set5 evaluation dataset, exceeding other methods with tailor-made efficient methods and saves over 50% memory while a large batch size is employed.

AgileIR: Memory-Efficient Group Shifted Windows Attention for Agile Image Restoration

TL;DR

AgileIR, group shifted attention mechanism along with window attention, which sparsely simplifies the model in architecture and keeps the performance still at 32.20 dB on Set5 evaluation dataset, exceeding other methods with tailor-made efficient methods and saves over 50% memory while a large batch size is employed.

Abstract

Image Transformers show a magnificent success in Image Restoration tasks. Nevertheless, most of transformer-based models are strictly bounded by exorbitant memory occupancy. Our goal is to reduce the memory consumption of Swin Transformer and at the same time speed up the model during training process. Thus, we introduce AgileIR, group shifted attention mechanism along with window attention, which sparsely simplifies the model in architecture. We propose Group Shifted Window Attention (GSWA) to decompose Shift Window Multi-head Self Attention (SW-MSA) and Window Multi-head Self Attention (W-MSA) into groups across their attention heads, contributing to shrinking memory usage in back propagation. In addition to that, we keep shifted window masking and its shifted learnable biases during training, in order to induce the model interacting across windows within the channel. We also re-allocate projection parameters to accelerate attention matrix calculation, which we found a negligible decrease in performance. As a result of experiment, compared with our baseline SwinIR and other efficient quantization models, AgileIR keeps the performance still at 32.20 dB on Set5 evaluation dataset, exceeding other methods with tailor-made efficient methods and saves over 50% memory while a large batch size is employed.
Paper Structure (10 sections, 6 equations, 4 figures, 2 tables)

This paper contains 10 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Shown is the comparison of memory usage in training on DIV2K div2k between SwinIR-light b1 (blue) and AgileIR (red), conducted on the GPU A100 80G. Benefited from AgileIR, the training memory vastly drops 2.23X from 67.52GB to 30.23GB with the batch size set to 256. SwinIR b1 exceeds the upper bound of memory when training batch size increments to 512.
  • Figure 2: The overall architecture of AgileIR. ASTL represents Agile Swin Transformer Layer and HQ Image Reconstruction consists of pixel shuffler and one convolutional layer.
  • Figure 3: The Architecture of Group Shifted Window Attention.
  • Figure 4: PSNR metric comparison of SwinIR-light and SwinIR on Set5 Set5 dataset with different Q, K dimensions.