Vision Transformer with Key-select Routing Attention for Single Image Dehazing

Lihan Tong; Weijia Li; Qingxia Yang; Liyuan Chen; Peng Chen

Vision Transformer with Key-select Routing Attention for Single Image Dehazing

Lihan Tong, Weijia Li, Qingxia Yang, Liyuan Chen, Peng Chen

TL;DR

Ksformer tackles single-image dehazing by integrating a content-aware Multi-scale Key-select Routing Attention (MKRA) with a Lightweight Frequency Processing Module (LFPM). The MKRA selects the most informative regions via a top-k mechanism across multi-scale windows, enabling long-range dependencies with reduced computation, while the LFPM enhances high-frequency details using minimal parameters. In experiments on synthetic and real hazy datasets, Ksformer delivers state-of-the-art PSNR/SSIM with a compact 5.8M-parameter model, and ablation studies confirm the contribution of MKRA and LFPM. The approach demonstrates that combining content-aware spatial attention with spectral feature emphasis can achieve high dehazing quality efficiently, though further work is needed to reduce GFLOPs for embedded deployment.

Abstract

We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.

Vision Transformer with Key-select Routing Attention for Single Image Dehazing

TL;DR

Abstract

Paper Structure (11 sections, 5 equations, 4 figures, 2 tables)

This paper contains 11 sections, 5 equations, 4 figures, 2 tables.

Introduction
Method
Image Dehazing
Multi-scale Key-select Routing Attention
Lightweight Frequency Processing Module
Multi-scale Key-select Routing Attention Module
Experiments
Implementation Details
Quantitative and Qualitative Experiments
Ablation Study
Conclusion

Figures (4)

Figure 1: The architecture of the proposed Ksformer.
Figure 2: (a) is the architecture of the proposed MKRAM. (b) is the architecture of the proposed LFPM.
Figure 3: Visual results comparisons on RTTS li2019benchmarking dataset. Zoom in for best view.
Figure 4: Visual results comparisons on Haze4K dataset liu2021synthetic. Zoom in for best view.

Vision Transformer with Key-select Routing Attention for Single Image Dehazing

TL;DR

Abstract

Vision Transformer with Key-select Routing Attention for Single Image Dehazing

Authors

TL;DR

Abstract

Table of Contents

Figures (4)