Heterogeneous window transformer for image denoising
Chunwei Tian, Menghua Zheng, Chia-Wen Lin, Zhiwu Li, David Zhang
TL;DR
HWformer addresses image denoising by bridging long-range pixel interactions and local detail preservation using a heterogeneous window transformer. It introduces heterogeneous global windows (GTEBlock) and a Transformer direction enhancement block (TDEBlock) with horizontal/vertical shifts and a sparse FFN to capture both global context and local neighboring patches without increasing denoising time. The architecture achieves competitive or state-of-the-art performance on synthetic and real-noise datasets while requiring only about 30% of Restormer’s denoising time, making it suitable for mobile devices. Ablation studies validate window size, directional shifts, and sparsity as key factors for performance and efficiency.
Abstract
Deep networks can usually depend on extracting more structural information to improve denoising results. However, they may ignore correlation between pixels from an image to pursue better denoising performance. Window transformer can use long- and short-distance modeling to interact pixels to address mentioned problem. To make a tradeoff between distance modeling and denoising time, we propose a heterogeneous window transformer (HWformer) for image denoising. HWformer first designs heterogeneous global windows to capture global context information for improving denoising effects. To build a bridge between long and short-distance modeling, global windows are horizontally and vertically shifted to facilitate diversified information without increasing denoising time. To prevent the information loss phenomenon of independent patches, sparse idea is guided a feed-forward network to extract local information of neighboring patches. The proposed HWformer only takes 30% of popular Restormer in terms of denoising time.
