Table of Contents
Fetching ...

Spread Your Wings: A Radial Strip Transformer for Image Deblurring

Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang

TL;DR

This work tackles motion deblurring by moving beyond Cartesian window-based transformers to a polar-coordinate transformer. The Radial Strip Transformer (RST) integrates Dynamic Radial Embedding (DRE) and Radial Strip Attention Solver (RSAS) in an asymmetric encoder–decoder, with a frequency-domain FFN to preserve context. By modeling rotation and translation motion through radial offsets and angular-aware attention, RST achieves state-of-the-art performance on multiple synthetic and real-world deblurring benchmarks, while maintaining computational efficiency. The approach demonstrates the practical impact of aligning architectural design with the intrinsic motion structure of blur, enabling sharper restoration across diverse datasets, though it acknowledges areas for improving cross-window interactions and real-world generalization.

Abstract

Exploring motion information is important for the motion deblurring task. Recent the window-based transformer approaches have achieved decent performance in image deblurring. Note that the motion causing blurry results is usually composed of translation and rotation movements and the window-shift operation in the Cartesian coordinate system by the window-based transformer approaches only directly explores translation motion in orthogonal directions. Thus, these methods have the limitation of modeling the rotation part. To alleviate this problem, we introduce the polar coordinate-based transformer, which has the angles and distance to explore rotation motion and translation information together. In this paper, we propose a Radial Strip Transformer (RST), which is a transformer-based architecture that restores the blur images in a polar coordinate system instead of a Cartesian one. RST contains a dynamic radial embedding module (DRE) to extract the shallow feature by a radial deformable convolution. We design a polar mask layer to generate the offsets for the deformable convolution, which can reshape the convolution kernel along the radius to better capture the rotation motion information. Furthermore, we proposed a radial strip attention solver (RSAS) as deep feature extraction, where the relationship of windows is organized by azimuth and radius. This attention module contains radial strip windows to reweight image features in the polar coordinate, which preserves more useful information in rotation and translation motion together for better recovering the sharp images. Experimental results on six synthesis and real-world datasets prove that our method performs favorably against other SOTA methods for the image deblurring task.

Spread Your Wings: A Radial Strip Transformer for Image Deblurring

TL;DR

This work tackles motion deblurring by moving beyond Cartesian window-based transformers to a polar-coordinate transformer. The Radial Strip Transformer (RST) integrates Dynamic Radial Embedding (DRE) and Radial Strip Attention Solver (RSAS) in an asymmetric encoder–decoder, with a frequency-domain FFN to preserve context. By modeling rotation and translation motion through radial offsets and angular-aware attention, RST achieves state-of-the-art performance on multiple synthetic and real-world deblurring benchmarks, while maintaining computational efficiency. The approach demonstrates the practical impact of aligning architectural design with the intrinsic motion structure of blur, enabling sharper restoration across diverse datasets, though it acknowledges areas for improving cross-window interactions and real-world generalization.

Abstract

Exploring motion information is important for the motion deblurring task. Recent the window-based transformer approaches have achieved decent performance in image deblurring. Note that the motion causing blurry results is usually composed of translation and rotation movements and the window-shift operation in the Cartesian coordinate system by the window-based transformer approaches only directly explores translation motion in orthogonal directions. Thus, these methods have the limitation of modeling the rotation part. To alleviate this problem, we introduce the polar coordinate-based transformer, which has the angles and distance to explore rotation motion and translation information together. In this paper, we propose a Radial Strip Transformer (RST), which is a transformer-based architecture that restores the blur images in a polar coordinate system instead of a Cartesian one. RST contains a dynamic radial embedding module (DRE) to extract the shallow feature by a radial deformable convolution. We design a polar mask layer to generate the offsets for the deformable convolution, which can reshape the convolution kernel along the radius to better capture the rotation motion information. Furthermore, we proposed a radial strip attention solver (RSAS) as deep feature extraction, where the relationship of windows is organized by azimuth and radius. This attention module contains radial strip windows to reweight image features in the polar coordinate, which preserves more useful information in rotation and translation motion together for better recovering the sharp images. Experimental results on six synthesis and real-world datasets prove that our method performs favorably against other SOTA methods for the image deblurring task.
Paper Structure (13 sections, 9 equations, 9 figures, 8 tables)

This paper contains 13 sections, 9 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: (a) Illustration of the motion field from the kernel estimation method carbajal2023blind, the red curves describe the motion state which causes the blurry images. (b) The demonstration of the motion trajectory causing blurry results can be composed of translation and rotation parts, as the blue curve consists of the red rotation motion and the yellow translation motion. (c) Illustrate the token relationship modeled under the polar coordinate system, where the $\theta_{ab}$ can represent the angular relative position between tokens (pixels) A and B.
  • Figure 2: Overview of RST. RST consists of an asymmetric encoder-decoder architecture and the encoder module only has FFN. The proposed dynamic radial embedding (DRE) module extracts shallow features under the polar coordinate system. (a) Illustration of the radial strip attention solver (RSAS) consisting of window-based attention and angular relative positional encoding.
  • Figure 3: (a) Illustration of dynamic radial embedding. The polar mask and the convolution layer compose the offset for the next deformable convolution layer. ${N}_{1}$ represents the number of sectors we split, we finally add a softmax layer to normalize the offsets of each sector. (b) Illustration of the difference between Swin liu2021swin transformer embedding module (left) and our radial embedding (right). Instead of the Cartesian system and CNNs extracting the shallow feature of the input image, we use a polar coordinate and radial deformable convolution to capture the features. In our case, the radial embedding strategy makes convolution operation along the azimuth. We use the white borders to denote the divisions of the windows and the blur and the yellow region shows the areas being captured along the horizontal and azimuth.
  • Figure 4: (a) Illustration of the polar embedding. In the polar embedding patch, since the shape of patches is a sectorial region, the boundary consists of curves and slanting lines rather than a horizontal or vertical boundary. (b) Illustration of convolution layer. We demonstrate the convolution kernel size by the green square dots, it has the fixed shape($e.g. 3 \times 3$) of the convolution kernel. Since the slanting edge of the patch, the upper-left square dots of the normal grid kernel are out of range, resulting in the convolution of these pixels in this portion becoming ineffective. (c) Illustration of deformable convolution layer. We use the yellow round dots to represent the receptive field of the deformable convolution. Based on the offsets, the area of the receptive field is reshaped with the structured information of the patches, which is sectorial, not squared. Since the adaptability of the deformable convolution to flexible shapes, it can capture shallow features better than normal convolution within the masked region.
  • Figure 5: The qualitative results on GoPro nah2017deep. The proposed method achieves the clearest result of restoring the characters on the green bus.
  • ...and 4 more figures