DMSSN: Distilled Mixed Spectral-Spatial Network for Hyperspectral Salient Object Detection
Haolin Qin, Tingfa Xu, Peifu Liu, Jingxuan Xu, Jianan Li
TL;DR
This work tackles hyperspectral salient object detection by jointly addressing spectral information preservation and spectral-spatial feature exploitation. It introduces DMSSN, a framework that couples Distilled Spectral Encoding (a GMM-guided spectral homogenization plus teacher-student autoencoder) with a Mixed Spectral-Spatial Transformer (MSST) backbone to learn spectral-spatial saliency representations efficiently. The authors also present HSOD-BIT, a large, high-resolution HSOD dataset with challenging scenes to enable robust deep learning. Empirical results show state-of-the-art performance on HSOD-BIT and HS-SOD, with strong efficiency and clear qualitative advantages in attention localization and saliency map sharpness, underscoring DMSSN’s practical impact for remote sensing and related HSOD tasks.
Abstract
Hyperspectral salient object detection (HSOD) has exhibited remarkable promise across various applications, particularly in intricate scenarios where conventional RGB-based approaches fall short. Despite the considerable progress in HSOD method advancements, two critical challenges require immediate attention. Firstly, existing hyperspectral data dimension reduction techniques incur a loss of spectral information, which adversely affects detection accuracy. Secondly, previous methods insufficiently harness the inherent distinctive attributes of hyperspectral images (HSIs) during the feature extraction process. To address these challenges, we propose a novel approach termed the Distilled Mixed Spectral-Spatial Network (DMSSN), comprising a Distilled Spectral Encoding process and a Mixed Spectral-Spatial Transformer (MSST) feature extraction network. The encoding process utilizes knowledge distillation to construct a lightweight autoencoder for dimension reduction, striking a balance between robust encoding capabilities and low computational costs. The MSST extracts spectral-spatial features through multiple attention head groups, collaboratively enhancing its resistance to intricate scenarios. Moreover, we have created a large-scale HSOD dataset, HSOD-BIT, to tackle the issue of data scarcity in this field and meet the fundamental data requirements of deep network training. Extensive experiments demonstrate that our proposed DMSSN achieves state-of-the-art performance on multiple datasets. We will soon make the code and dataset publicly available on https://github.com/anonymous0519/HSOD-BIT.
