Table of Contents
Fetching ...

Dual-Hybrid Attention Network for Specular Highlight Removal

Xiaojiao Guo, Xuhang Chen, Shenghong Luo, Shuqiang Wang, Chi-Man Pun

TL;DR

DHAN-SHR targets the persistent problem of specular highlight removal without auxiliary priors by introducing a dual-hybrid attention framework that jointly models local spectral-spatial details and global context. The method leverages an Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) with Pixel-wise and Channel-wise Spatial-Spectral Shifting Window Attentions, plus a Frequency Processor, and a Global Adaptive Dual Attention Transformer (G-DAT) to fuse global contextual cues. A hybrid benchmark combining PSD, SHIQ, and SSHR datasets is proposed and used to retrain 18 state-of-the-art methods for fair comparison, where DHAN-SHR achieves superior quantitative and qualitative results, particularly on real-world PSD and SSHR data. The work advances practical, priors-free specular highlight removal and offers a comprehensive dataset and ablation evidence to support the effectiveness of dual-hybrid attention for image enhancement tasks.

Abstract

Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos, ultimately improving the performance of downstream tasks such as content-based retrieval, object recognition, and scene understanding. Despite significant advances in deep learning-based methods, current state-of-the-art approaches often rely on additional priors or supervision, limiting their practicality and generalization capability. In this paper, we propose the Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR), an end-to-end network that introduces novel hybrid attention mechanisms to effectively capture and process information across different scales and domains without relying on additional priors or supervision. DHAN-SHR consists of two key components: the Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) and the Adaptive Global Dual Attention Transformer (G-DAT). The L-HD-DAT captures local inter-channel and inter-pixel dependencies while incorporating spectral domain features, enabling the network to effectively model the complex interactions between specular highlights and the underlying surface properties. The G-DAT models global inter-channel relationships and long-distance pixel dependencies, allowing the network to propagate contextual information across the entire image and generate more coherent and consistent highlight-free results. To evaluate the performance of DHAN-SHR and facilitate future research in this area, we compile a large-scale benchmark dataset comprising a diverse range of images with varying levels of specular highlights. Through extensive experiments, we demonstrate that DHAN-SHR outperforms 18 state-of-the-art methods both quantitatively and qualitatively, setting a new standard for specular highlight removal in multimedia applications.

Dual-Hybrid Attention Network for Specular Highlight Removal

TL;DR

DHAN-SHR targets the persistent problem of specular highlight removal without auxiliary priors by introducing a dual-hybrid attention framework that jointly models local spectral-spatial details and global context. The method leverages an Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) with Pixel-wise and Channel-wise Spatial-Spectral Shifting Window Attentions, plus a Frequency Processor, and a Global Adaptive Dual Attention Transformer (G-DAT) to fuse global contextual cues. A hybrid benchmark combining PSD, SHIQ, and SSHR datasets is proposed and used to retrain 18 state-of-the-art methods for fair comparison, where DHAN-SHR achieves superior quantitative and qualitative results, particularly on real-world PSD and SSHR data. The work advances practical, priors-free specular highlight removal and offers a comprehensive dataset and ablation evidence to support the effectiveness of dual-hybrid attention for image enhancement tasks.

Abstract

Specular highlight removal plays a pivotal role in multimedia applications, as it enhances the quality and interpretability of images and videos, ultimately improving the performance of downstream tasks such as content-based retrieval, object recognition, and scene understanding. Despite significant advances in deep learning-based methods, current state-of-the-art approaches often rely on additional priors or supervision, limiting their practicality and generalization capability. In this paper, we propose the Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR), an end-to-end network that introduces novel hybrid attention mechanisms to effectively capture and process information across different scales and domains without relying on additional priors or supervision. DHAN-SHR consists of two key components: the Adaptive Local Hybrid-Domain Dual Attention Transformer (L-HD-DAT) and the Adaptive Global Dual Attention Transformer (G-DAT). The L-HD-DAT captures local inter-channel and inter-pixel dependencies while incorporating spectral domain features, enabling the network to effectively model the complex interactions between specular highlights and the underlying surface properties. The G-DAT models global inter-channel relationships and long-distance pixel dependencies, allowing the network to propagate contextual information across the entire image and generate more coherent and consistent highlight-free results. To evaluate the performance of DHAN-SHR and facilitate future research in this area, we compile a large-scale benchmark dataset comprising a diverse range of images with varying levels of specular highlights. Through extensive experiments, we demonstrate that DHAN-SHR outperforms 18 state-of-the-art methods both quantitatively and qualitatively, setting a new standard for specular highlight removal in multimedia applications.
Paper Structure (24 sections, 10 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 10 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: The overall architecture of our proposed Dual-Hybrid Attention Network for Specular Highlight Removal (DHAN-SHR).
  • Figure 2: Illustration of the window shifting approach and the attention mask applied to the pixel-wise shifting window attention.
  • Figure 3: Visual comparative analysis of our method against leading SOTA approaches, highlighting our superior ability to remove specular highlights while preserving the original image's color tone, structure, and crucial details, such as text clarity on reflective surfaces.
  • Figure 4: Comprehensive visual comparison. (a) Input specular highlight image, (b) Tan [10], (c) Yoon [31], (d) Shen [11], (e) Shen [12], (f) Yang [13], (g) Shen [14], (h) Akashi [15], (i) Huo [32], (j) Fu [18], (k) Yamamoto [19], (l) Saha [20], (m) SLRR [22], (n) JSHDR [6], (o) SpecularityNet [5], (p) MG-CycleGAN [26], (q) Wu [25], (r) TSHRNet [7], (s) AHA [28], (t) Ours, (u) GT diffuse image. The reader is encouraged to zoom-in.
  • Figure 5: Comprehensive visual comparison. (a) Input specular highlight image, (b) Tan [10], (c) Yoon [31], (d) Shen [11], (e) Shen [12], (f) Yang [13], (g) Shen [14], (h) Akashi [15], (i) Huo [32], (j) Fu [18], (k) Yamamoto [19], (l) Saha [20], (m) SLRR [22], (n) JSHDR [6], (o) SpecularityNet [5], (p) MG-CycleGAN [26], (q) Wu [25], (r) TSHRNet [7], (s) AHA [28], (t) Ours, (u) GT diffuse image. The reader is encouraged to zoom-in.
  • ...and 3 more figures