Restoring Images in Adverse Weather Conditions via Histogram Transformer
Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, Xiaochun Cao
TL;DR
This work tackles image restoration under diverse adverse weather with a unified model. It introduces Histoformer, a histogram-based transformer that uses dynamic-range histogram self-attention (DHSA) and a Dual-scale Gated Feed-Forward (DGFF) to model weather-induced degradation across global and local ranges, complemented by a correlation-aware Loss. A Pearson correlation-based loss $\mathcal{L}_{cor} = \tfrac{1}{2}(1-\rho(I^{hq},I^{gt}))$ with total loss $\mathcal{L}=\mathcal{L}_{rec}+\alpha\mathcal{L}_{cor}$ guides the reconstruction to preserve intensity ranking. Experiments on Snow100K, Raindrop, and Outdoor-Rain show state-of-the-art performance, and real-world deweathering improves downstream detection, demonstrating practical impact and broad applicability. The method provides a scalable, single-model solution for all-weather image restoration and is accompanied by released code.
Abstract
Transformer-based image restoration methods in adverse weather have achieved significant progress. Most of them use self-attention along the channel dimension or within spatially fixed-range blocks to reduce computational load. However, such a compromise results in limitations in capturing long-range spatial features. Inspired by the observation that the weather-induced degradation factors mainly cause similar occlusion and brightness, in this work, we propose an efficient Histogram Transformer (Histoformer) for restoring images affected by adverse weather. It is powered by a mechanism dubbed histogram self-attention, which sorts and segments spatial features into intensity-based bins. Self-attention is then applied across bins or within each bin to selectively focus on spatial features of dynamic range and process similar degraded pixels of the long range together. To boost histogram self-attention, we present a dynamic-range convolution enabling conventional convolution to conduct operation over similar pixels rather than neighbor pixels. We also observe that the common pixel-wise losses neglect linear association and correlation between output and ground-truth. Thus, we propose to leverage the Pearson correlation coefficient as a loss function to enforce the recovered pixels following the identical order as ground-truth. Extensive experiments demonstrate the efficacy and superiority of our proposed method. We have released the codes in Github.
