Table of Contents
Fetching ...

RouteWinFormer: A Route-Window Transformer for Middle-range Attention in Image Restoration

Qifan Li, Tianyi Liang, Xingtao Wang, Xiaopeng Fan

TL;DR

The paper addresses the inefficiency of global attention for image restoration by showing that middle-range context suffices for many degradation types. It introduces RouteWinFormer, a window-based Transformer that uses Route-Windows Attention to dynamically select nearby windows based on regional similarity, complemented by Multi-Scale Structure Regularization to guide sub-scales toward recovering image textures and structures. The method employs a four-scale U-shaped architecture with IRBlock blocks and a combination of RWAM and SWAM, along with a loss that includes MSR and a frequency-domain term. Empirical results across 9 datasets and four restoration tasks demonstrate state-of-the-art performance with improved efficiency, highlighting the practicality of mid-range attention for real-world applications.

Abstract

Transformer models have recently garnered significant attention in image restoration due to their ability to capture long-range pixel dependencies. However, long-range attention often results in computational overhead without practical necessity, as degradation and context are typically localized. Normalized average attention distance across various degradation datasets shows that middle-range attention is enough for image restoration. Building on this insight, we propose RouteWinFormer, a novel window-based Transformer that models middle-range context for image restoration. RouteWinFormer incorporates Route-Windows Attnetion Module, which dynamically selects relevant nearby windows based on regional similarity for attention aggregation, extending the receptive field to a mid-range size efficiently. In addition, we introduce Multi-Scale Structure Regularization during training, enabling the sub-scale of the U-shaped network to focus on structural information, while the original-scale learns degradation patterns based on generalized image structure priors. Extensive experiments demonstrate that RouteWinFormer outperforms state-of-the-art methods across 9 datasets in various image restoration tasks.

RouteWinFormer: A Route-Window Transformer for Middle-range Attention in Image Restoration

TL;DR

The paper addresses the inefficiency of global attention for image restoration by showing that middle-range context suffices for many degradation types. It introduces RouteWinFormer, a window-based Transformer that uses Route-Windows Attention to dynamically select nearby windows based on regional similarity, complemented by Multi-Scale Structure Regularization to guide sub-scales toward recovering image textures and structures. The method employs a four-scale U-shaped architecture with IRBlock blocks and a combination of RWAM and SWAM, along with a loss that includes MSR and a frequency-domain term. Empirical results across 9 datasets and four restoration tasks demonstrate state-of-the-art performance with improved efficiency, highlighting the practicality of mid-range attention for real-world applications.

Abstract

Transformer models have recently garnered significant attention in image restoration due to their ability to capture long-range pixel dependencies. However, long-range attention often results in computational overhead without practical necessity, as degradation and context are typically localized. Normalized average attention distance across various degradation datasets shows that middle-range attention is enough for image restoration. Building on this insight, we propose RouteWinFormer, a novel window-based Transformer that models middle-range context for image restoration. RouteWinFormer incorporates Route-Windows Attnetion Module, which dynamically selects relevant nearby windows based on regional similarity for attention aggregation, extending the receptive field to a mid-range size efficiently. In addition, we introduce Multi-Scale Structure Regularization during training, enabling the sub-scale of the U-shaped network to focus on structural information, while the original-scale learns degradation patterns based on generalized image structure priors. Extensive experiments demonstrate that RouteWinFormer outperforms state-of-the-art methods across 9 datasets in various image restoration tasks.

Paper Structure

This paper contains 13 sections, 13 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Normalized average attention distance across various degradation datasets, where attention intensity is weighted by pixel distance and normalized by image size. A larger value indicates a broader attention area. All datasets show values below 0.3, suggesting that long-range modeling may not be practical necessity for image restoration.
  • Figure 2: The architecture of RouteWinFormer: (a) Transformer Block with Image Restoration Block (IRBlock) and Feed-Forward Network (FFN), incorporating Route-Windows Attention Module (RWAM) and Shift-Windows Attention Module (SWAM). (b) RWAM for aggregating middle-range contextual information. (c) In RWAM, the Router dynamically selects relevant nearby windows for attention aggregation based on regional similarities.
  • Figure 3: Visualize sample of image defocus deblurring on the DPDD dataset.
  • Figure 4: Visualize sample of image desnowing on the CSD dataset.
  • Figure 5: Visualize sample of image dehazing on the Haze4K dataset.
  • ...and 2 more figures