CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network

Xiaotong Yu; Ruihan Xie; Zhihe Zhao; Chang-Wen Chen

CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network

Xiaotong Yu, Ruihan Xie, Zhihe Zhao, Chang-Wen Chen

TL;DR

CSDNet tackles the inefficiency and noise inherent in multi-modality perceptual systems by exploiting low-coherence depth-thermal data for salient object detection. It introduces a cross shallow and deep perception framework comprising CFARSP for shallow prescreening, ICAN for deep semantic coherence, and SAMAEP to map depth-thermal features into a generalized feature space via SAM guidance. On the VDT-2048 dataset, CSDNet achieves competitive or superior performance relative to RGB-D and RGB-T methods and approaches RGB-D-T baselines while delivering substantial efficiency gains (≈5.97× faster and ≈0.0036× FLOPs). These results demonstrate effective integration of depth-thermal information with reduced computational burden, making the approach well-suited for edge devices and privacy-conscious mobile robotics under challenging lighting conditions.

Abstract

While we enjoy the richness and informativeness of multimodal data, it also introduces interference and redundancy of information. To achieve optimal domain interpretation with limited resources, we propose CSDNet, a lightweight \textbf{C}ross \textbf{S}hallow and \textbf{D}eep Perception \textbf{Net}work designed to integrate two modalities with less coherence, thereby discarding redundant information or even modality. We implement our CSDNet for Salient Object Detection (SOD) task in robotic perception. The proposed method capitalises on spatial information prescreening and implicit coherence navigation across shallow and deep layers of the depth-thermal (D-T) modality, prioritising integration over fusion to maximise the scene interpretation. To further refine the descriptive capabilities of the encoder for the less-known D-T modalities, we also propose SAMAEP to guide an effective feature mapping to the generalised feature space. Our approach is tested on the VDT-2048 dataset, leveraging the D-T modality outperforms those of SOTA methods using RGB-T or RGB-D modalities for the first time, achieves comparable performance with the RGB-D-T triple-modality benchmark method with 5.97 times faster at runtime and demanding 0.0036 times fewer FLOPs. Demonstrates the proposed CSDNet effectively integrates the information from the D-T modality. The code will be released upon acceptance.

CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network

TL;DR

Abstract

Paper Structure (15 sections, 9 equations, 6 figures, 7 tables)

This paper contains 15 sections, 9 equations, 6 figures, 7 tables.

Introduction
Related Works
Dual-modal and Triple-modal Salient Object Detection
Segment Anything Model and Derived Works
Proposed Method
CFAR Saliency Prescreening Module
Implicit Coherence Activation Navigation Module
SAM-Assist Encoder Pre-training Framework
Loss Formulation
Experiments
Dataset and Evaluation Metrics
Implementation Details
Experimental Results
Ablation Analysis
Conclusion

Figures (6)

Figure 1: (a) The TSNE representations of different modalities; (left) Depth and thermal are highlighted; (right) RGB modality is highlighted (b) The visualised results of existing methods on D-T modality, the RGB-dominated models show less capability in interpreting D-T data.
Figure 2: The overview of the proposed network CSDNet
Figure 3: The schematic of CFAR Saliency Prescreening Module
Figure 4: The schematic of SAM-assist depth encoder pre-training framework
Figure 5: Visual Comparison on VDT-2048 dataset
...and 1 more figures

CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network

TL;DR

Abstract

CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network

Authors

TL;DR

Abstract

Table of Contents

Figures (6)