SnowFormer: Context Interaction Transformer with Scale-awareness for Single Image Desnowing

Sixiang Chen; Tian Ye; Yun Liu; Erkang Chen

SnowFormer: Context Interaction Transformer with Scale-awareness for Single Image Desnowing

Sixiang Chen, Tian Ye, Yun Liu, Erkang Chen

TL;DR

SnowFormer tackles single-image desnowing under diverse degradations by integrating scale-aware feature aggregation with a local-to-global context interaction framework. It introduces a Snow Query Generator to produce scale-aware, non-parametric queries that drive cross-attention between global snow cues and local patches, complemented by a Local Interaction module and a degradation-aware Attention Refinement Head. The approach achieves state-of-the-art performance across six synthetic and real-world snow datasets, with substantial PSNR/SSIM gains and competitive computational costs, validated by comprehensive ablations. The work advances task-specific transformer design for low-level vision, enabling robust desnowing in practical applications.

Abstract

Due to various and complicated snow degradations, single image desnowing is a challenging image restoration task. As prior arts can not handle it ideally, we propose a novel transformer, SnowFormer, which explores efficient cross-attentions to build local-global context interaction across patches and surpasses existing works that employ local operators or vanilla transformers. Compared to prior desnowing methods and universal image restoration methods, SnowFormer has several benefits. Firstly, unlike the multi-head self-attention in recent image restoration Vision Transformers, SnowFormer incorporates the multi-head cross-attention mechanism to perform local-global context interaction between scale-aware snow queries and local-patch embeddings. Second, the snow queries in SnowFormer are generated by the query generator from aggregated scale-aware features, which are rich in potential clean cues, leading to superior restoration results. Third, SnowFormer outshines advanced state-of-the-art desnowing networks and the prevalent universal image restoration transformers on six synthetic and real-world datasets. The code is released in \url{https://github.com/Ephemeral182/SnowFormer}.

SnowFormer: Context Interaction Transformer with Scale-awareness for Single Image Desnowing

TL;DR

Abstract

Paper Structure (20 sections, 10 equations, 9 figures, 5 tables)

This paper contains 20 sections, 10 equations, 9 figures, 5 tables.

Introduction
Related Works
Single Image Desnowing
All-in-one Adverse Weather Removal
Vision Transformer for Image Restoration
Proposed Method
Scale-aware Feature Aggregation
Context Interaction
Local Interaction for Degradation Perceiving and Modeling
Local-Global Context Interaction with Scale-Awareness
Attention Refinement Head
Loss Function
Experiments
Implementation Details
Evaluation Metrics and Datasets
...and 5 more sections

Figures (9)

Figure 1: Left: Snow Domain. Snow scenes consist of complicated degradations. Right: Rain Domain. Rain images commonly include regular rain streaks. Red arrows point to typical degradations, which indicates that diverse snow degradations are more irregular and varisized than rain streaks. Please zoom in for a better view.
Figure 2: Left: The result drawbacks of SOTA methods compared with the proposed SnowFormer. (a). Snow scene input. (b-d). Results of existing SOTA desnowing approaches chen2020jstasrhdcwnetzhang2021deep. (e). snow removal result based on unified adverse weather architecture valanarasu2022transweather. (f). The result of our proposed SnowFormer. (g). Ground truth. As shown, (b) cannot remove the diverse snow degradations because of its divide and conquer strategy. High-frequency details in (c) are removed by the network along with snow degradations. (d) still retains certain snow degradations and the restoration of details is also flawed. (e) ignore the characteristic of snow degarations, and the unified framework is powerless to clean up the snow scene. SnowFormer tackles the above issues perfectly and the results are closer to the corresponding ground truth. Right: Trade-off between PSNR performance v.s parameter and GFLOPs on CSD hdcwnet. SnowFormer surpasses previous methods in trade-off and performance substantially.
Figure 3: The architecture of SnowFormer for single-image snow removal. Our SnowFormer is powered by a scale-aware structure incorporating efficient Local Interaction and Local-Global Interaction. The core designs of SnowFormer are: (i) Local Interaction that performs local-patch feature interaction by self-attention operation across each patch pixels §\ref{['LI']}, (ii) Driven by aggregated snow features, Snow Query Generator that produces spatially-enriched query-key features for Local-Global Context Interaction §\ref{['local-global']}, and (iii) Local-Global Context Interaction that performs cross-attention between snow queries (global) and local patches (local) §\ref{['local-global']}. We present detailed structures of the convolutional block of the encoder and transformer block in our Supplementary Materials.
Figure 4: Intra-patch degradation similarity presented in real snowy samples. Similar snow degradations exit in the same local patch, which motivates us to exploit this natural property to improve the ability of perceiving and modeling . The double arrow indicates similar snow degradation. Please zoom in to see the details better.
Figure 5: A few real samples to verify our motivation of Local-Global Context Interaction. Local patches almost occupied by snow degradations are marked with red rectangles. For such patches, local clean cues hidden within are deficient.
...and 4 more figures

SnowFormer: Context Interaction Transformer with Scale-awareness for Single Image Desnowing

TL;DR

Abstract

SnowFormer: Context Interaction Transformer with Scale-awareness for Single Image Desnowing

Authors

TL;DR

Abstract

Table of Contents

Figures (9)