Table of Contents
Fetching ...

Fourier-RWKV: A Multi-State Perception Network for Efficient Image Dehazing

Lirong Zheng, Yanshan Li, Rui Yu, Kaihao Zhang

TL;DR

This work tackles image dehazing under non-uniform real-world haze with a focus on efficiency. It introduces Fourier-RWKV, a linear-complexity multi-state perception network that fuses spatial deformable perception (DQ-Shift), frequency-domain modeling (Fourier Mix), and semantic-guided feature fusion (SBM). The approach uses a four-level encoder-decoder with FRWKV blocks and a semantic bridge, achieving state-of-the-art restoration across synthetic and real hazy datasets while maintaining lower computational cost. The results demonstrate robust generalization to diverse haze patterns and highlight the method's practicality for real-time or large-scale deployment.

Abstract

Image dehazing is crucial for reliable visual perception, yet it remains highly challenging under real-world non-uniform haze conditions. Although Transformer-based methods excel at capturing global context, their quadratic computational complexity hinders real-time deployment. To address this, we propose Fourier Receptance Weighted Key Value (Fourier-RWKV), a novel dehazing framework based on a Multi-State Perception paradigm. The model achieves comprehensive haze degradation modeling with linear complexity by synergistically integrating three distinct perceptual states: (1) Spatial-form Perception, realized through the Deformable Quad-directional Token Shift (DQ-Shift) operation, which dynamically adjusts receptive fields to accommodate local haze variations; (2) Frequency-domain Perception, implemented within the Fourier Mix block, which extends the core WKV attention mechanism of RWKV from the spatial domain to the Fourier domain, preserving the long-range dependencies essential for global haze estimation while mitigating spatial attenuation; (3) Semantic-relation Perception, facilitated by the Semantic Bridge Module (SBM), which utilizes Dynamic Semantic Kernel Fusion (DSK-Fusion) to precisely align encoder-decoder features and suppress artifacts. Extensive experiments on multiple benchmarks demonstrate that Fourier-RWKV delivers state-of-the-art performance across diverse haze scenarios while significantly reducing computational overhead, establishing a favorable trade-off between restoration quality and practical efficiency. Code is available at: https://github.com/Dilizlr/Fourier-RWKV.

Fourier-RWKV: A Multi-State Perception Network for Efficient Image Dehazing

TL;DR

This work tackles image dehazing under non-uniform real-world haze with a focus on efficiency. It introduces Fourier-RWKV, a linear-complexity multi-state perception network that fuses spatial deformable perception (DQ-Shift), frequency-domain modeling (Fourier Mix), and semantic-guided feature fusion (SBM). The approach uses a four-level encoder-decoder with FRWKV blocks and a semantic bridge, achieving state-of-the-art restoration across synthetic and real hazy datasets while maintaining lower computational cost. The results demonstrate robust generalization to diverse haze patterns and highlight the method's practicality for real-time or large-scale deployment.

Abstract

Image dehazing is crucial for reliable visual perception, yet it remains highly challenging under real-world non-uniform haze conditions. Although Transformer-based methods excel at capturing global context, their quadratic computational complexity hinders real-time deployment. To address this, we propose Fourier Receptance Weighted Key Value (Fourier-RWKV), a novel dehazing framework based on a Multi-State Perception paradigm. The model achieves comprehensive haze degradation modeling with linear complexity by synergistically integrating three distinct perceptual states: (1) Spatial-form Perception, realized through the Deformable Quad-directional Token Shift (DQ-Shift) operation, which dynamically adjusts receptive fields to accommodate local haze variations; (2) Frequency-domain Perception, implemented within the Fourier Mix block, which extends the core WKV attention mechanism of RWKV from the spatial domain to the Fourier domain, preserving the long-range dependencies essential for global haze estimation while mitigating spatial attenuation; (3) Semantic-relation Perception, facilitated by the Semantic Bridge Module (SBM), which utilizes Dynamic Semantic Kernel Fusion (DSK-Fusion) to precisely align encoder-decoder features and suppress artifacts. Extensive experiments on multiple benchmarks demonstrate that Fourier-RWKV delivers state-of-the-art performance across diverse haze scenarios while significantly reducing computational overhead, establishing a favorable trade-off between restoration quality and practical efficiency. Code is available at: https://github.com/Dilizlr/Fourier-RWKV.

Paper Structure

This paper contains 25 sections, 16 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Improvement of our Fourier-RWKV over the SOTA approaches. The bubble size represents the number of model parameters. All models are tested on SOTS-Outdoor li2018benchmarking.
  • Figure 2: Spectral Properties of Haze. The exchange of amplitude spectrum (a, b) reveals that haze information is primarily encoded in the amplitude spectrum, while the phase spectrum retains most structural integrity. Replacing the DC component (b, c) improves global image contrast, indicating that haze degradation is concentrated in low-frequency regions and demonstrating the global effect of frequency-domain local operations in the spatial domain.
  • Figure 3: The architecture of the Fourier-RWKV. It adopts a classic encoder-decoder framework with four levels, each stacking $N_i$ FRWKV blocks. Each block consists of a Fourier Mix block and a Channel Mix block, both equipped with a deformable quad-directional token shift (DQ-Shift). Fourier Mix integrates spectral sequencing and inversion (Seq and ISeq), converting spectral features into sequences for the Bi-WKV mechanism. The semantic bridge module (SBM) in the skip connections ensures encoder-decoder feature alignment.
  • Figure 4: Illustration of the semantic bridge module (SBM). It dynamically generates multi-scale convolutional kernels based on the semantic relationships between encoder and decoder features. The extracted features are fused by Kernel Selection Fusion Unit (KSFU) as semantic priors, which are then used to refine the encoder features through DC component replacement. Subsequently, the refined features are concatenated with the corresponding decoder features to achieve cross-stage semantic alignment.
  • Figure 5: Visual comparisons on synthetic hazy images from the SOTS dataset. Key regions highlighted by red boxes are enlarged in the lower-left corner for clearer comparison.
  • ...and 2 more figures