Table of Contents
Fetching ...

DMSSN: Distilled Mixed Spectral-Spatial Network for Hyperspectral Salient Object Detection

Haolin Qin, Tingfa Xu, Peifu Liu, Jingxuan Xu, Jianan Li

TL;DR

This work tackles hyperspectral salient object detection by jointly addressing spectral information preservation and spectral-spatial feature exploitation. It introduces DMSSN, a framework that couples Distilled Spectral Encoding (a GMM-guided spectral homogenization plus teacher-student autoencoder) with a Mixed Spectral-Spatial Transformer (MSST) backbone to learn spectral-spatial saliency representations efficiently. The authors also present HSOD-BIT, a large, high-resolution HSOD dataset with challenging scenes to enable robust deep learning. Empirical results show state-of-the-art performance on HSOD-BIT and HS-SOD, with strong efficiency and clear qualitative advantages in attention localization and saliency map sharpness, underscoring DMSSN’s practical impact for remote sensing and related HSOD tasks.

Abstract

Hyperspectral salient object detection (HSOD) has exhibited remarkable promise across various applications, particularly in intricate scenarios where conventional RGB-based approaches fall short. Despite the considerable progress in HSOD method advancements, two critical challenges require immediate attention. Firstly, existing hyperspectral data dimension reduction techniques incur a loss of spectral information, which adversely affects detection accuracy. Secondly, previous methods insufficiently harness the inherent distinctive attributes of hyperspectral images (HSIs) during the feature extraction process. To address these challenges, we propose a novel approach termed the Distilled Mixed Spectral-Spatial Network (DMSSN), comprising a Distilled Spectral Encoding process and a Mixed Spectral-Spatial Transformer (MSST) feature extraction network. The encoding process utilizes knowledge distillation to construct a lightweight autoencoder for dimension reduction, striking a balance between robust encoding capabilities and low computational costs. The MSST extracts spectral-spatial features through multiple attention head groups, collaboratively enhancing its resistance to intricate scenarios. Moreover, we have created a large-scale HSOD dataset, HSOD-BIT, to tackle the issue of data scarcity in this field and meet the fundamental data requirements of deep network training. Extensive experiments demonstrate that our proposed DMSSN achieves state-of-the-art performance on multiple datasets. We will soon make the code and dataset publicly available on https://github.com/anonymous0519/HSOD-BIT.

DMSSN: Distilled Mixed Spectral-Spatial Network for Hyperspectral Salient Object Detection

TL;DR

This work tackles hyperspectral salient object detection by jointly addressing spectral information preservation and spectral-spatial feature exploitation. It introduces DMSSN, a framework that couples Distilled Spectral Encoding (a GMM-guided spectral homogenization plus teacher-student autoencoder) with a Mixed Spectral-Spatial Transformer (MSST) backbone to learn spectral-spatial saliency representations efficiently. The authors also present HSOD-BIT, a large, high-resolution HSOD dataset with challenging scenes to enable robust deep learning. Empirical results show state-of-the-art performance on HSOD-BIT and HS-SOD, with strong efficiency and clear qualitative advantages in attention localization and saliency map sharpness, underscoring DMSSN’s practical impact for remote sensing and related HSOD tasks.

Abstract

Hyperspectral salient object detection (HSOD) has exhibited remarkable promise across various applications, particularly in intricate scenarios where conventional RGB-based approaches fall short. Despite the considerable progress in HSOD method advancements, two critical challenges require immediate attention. Firstly, existing hyperspectral data dimension reduction techniques incur a loss of spectral information, which adversely affects detection accuracy. Secondly, previous methods insufficiently harness the inherent distinctive attributes of hyperspectral images (HSIs) during the feature extraction process. To address these challenges, we propose a novel approach termed the Distilled Mixed Spectral-Spatial Network (DMSSN), comprising a Distilled Spectral Encoding process and a Mixed Spectral-Spatial Transformer (MSST) feature extraction network. The encoding process utilizes knowledge distillation to construct a lightweight autoencoder for dimension reduction, striking a balance between robust encoding capabilities and low computational costs. The MSST extracts spectral-spatial features through multiple attention head groups, collaboratively enhancing its resistance to intricate scenarios. Moreover, we have created a large-scale HSOD dataset, HSOD-BIT, to tackle the issue of data scarcity in this field and meet the fundamental data requirements of deep network training. Extensive experiments demonstrate that our proposed DMSSN achieves state-of-the-art performance on multiple datasets. We will soon make the code and dataset publicly available on https://github.com/anonymous0519/HSOD-BIT.
Paper Structure (36 sections, 20 equations, 12 figures, 8 tables)

This paper contains 36 sections, 20 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Comparison of the predicted results from RGB-based and HSI-based methods under challenging conditions, such as a similar background, uneven illumination, and overexposure, HSI-based methods prioritize the importance of spectral salience, leading to more dependable detection results.
  • Figure 2: Under overexposure, the foreground objects and background may appear similar in color, but their spectral curves exhibit significant differences, enabling their distinguishability.
  • Figure 3: The overall architecture of the proposed Distilled Mixed Spectral-Spatial Network (DMSSN) is shown in the top part of the figure. The bottom part illustrates the Mixed Spectral-Spatial Transformer (MSST) block, where queries (Q) are obtained through linear projection, and Keys(K) and values(V) are generated through dual branches with spectral and spatial features respectively. The circle symbols with sum, cross and C represent pixel-wise addition, Hadamard product and feature concatenation respectively.
  • Figure 4: Network architecture for autoencoders. The left part illustrates the teacher autoencoder, which consists of encoding blocks and convolutional layers. The encoding block implements simultaneous extraction and fusion of spatial and spectral features. The right part provides the details of the student autoencoder, composed only of convolutional layers. The red dashed line linking the autoencoders indicates the distillation process.
  • Figure 5: Examples of pseudo-color images and ground truth annotations from the HSOD-BIT dataset. Some of the collected scenes in this dataset cannot be adequately processed by RGB methods, which highlights the need for hyperspectral data in the HSOD task.
  • ...and 7 more figures