Dual-Hybrid Attention Network for Specular Highlight Removal
Xiaojiao Guo, Xuhang Chen, Shenghong Luo, Shuqiang Wang, Chi-Man Pun
TL;DR
DHAN-SHR targets the persistent problem of specular highlight removal without auxiliary priors by introducing a dual-hybrid attention framework that jointly models local spectral-spatial details and global context. The method leverages an Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) with Pixel-wise and Channel-wise Spatial-Spectral Shifting Window Attentions, plus a Frequency Processor, and a Global Adaptive Dual Attention Transformer (G-DAT) to fuse global contextual cues. A hybrid benchmark combining PSD, SHIQ, and SSHR datasets is proposed and used to retrain 18 state-of-the-art methods for fair comparison, where DHAN-SHR achieves superior quantitative and qualitative results, particularly on real-world PSD and SSHR data. The work advances practical, priors-free specular highlight removal and offers a comprehensive dataset and ablation evidence to support the effectiveness of dual-hybrid attention for image enhancement tasks.
Abstract
Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos, ultimately improving the performance of downstream tasks such as content-based retrieval, object recognition, and scene understanding. Despite significant advances in deep learning-based methods, current state-of-the-art approaches often rely on additional priors or supervision, limiting their practicality and generalization capability. In this paper, we propose the Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR), an end-to-end network that introduces novel hybrid attention mechanisms to effectively capture and process information across different scales and domains without relying on additional priors or supervision. DHAN-SHR consists of two key components: the Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) and the Adaptive Global Dual Attention Transformer (G-DAT). The L-HD-DAT captures local inter-channel and inter-pixel dependencies while incorporating spectral domain features, enabling the network to effectively model the complex interactions between specular highlights and the underlying surface properties. The G-DAT models global inter-channel relationships and long-distance pixel dependencies, allowing the network to propagate contextual information across the entire image and generate more coherent and consistent highlight-free results. To evaluate the performance of DHAN-SHR and facilitate future research in this area, we compile a large-scale benchmark dataset comprising a diverse range of images with varying levels of specular highlights. Through extensive experiments, we demonstrate that DHAN-SHR outperforms 18 state-of-the-art methods both quantitatively and qualitatively, setting a new standard for specular highlight removal in multimedia applications.
