A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

Rui Huang; Qingyi Zhao; Yan Xing; Sihua Gao; Weifeng Xu; Yuxiang Zhang; Wei Fan

A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

Rui Huang, Qingyi Zhao, Yan Xing, Sihua Gao, Weifeng Xu, Yuxiang Zhang, Wei Fan

TL;DR

This work tackles RGB-D salient object detection with multiscale CNNs that typically incur large model sizes. It introduces Saliency Enhanced Feature Fusion (SEFF), which uses neighboring-scale saliency maps to enrich feature fusion, formalized as $\mathbf{F} = \Phi(\mathbf{F}_1, \mathbf{F}_2, \mathbf{S})$, and integrates this into a three-scale detector SEFFSal built on FasterNet backbones with a CPR-based decoder. Training leverages an adaptive pixel intensity loss $\mathcal{L}_{API}$, combining BCE, IoU, and L1 terms, with a total loss $\mathcal{L} = \sum_{i=1}^{3} \sum_{j=1}^{4} \mathcal{L}_{API}(\mathbf{S}_{ij}, \mathbf{S}_{GT})$, achieving robust supervision across multiple scales. Experiments on five benchmarks show consistent improvements over ten SOTA RGB-D SOD methods, validating the effectiveness and practicality of saliency-guided cross-scale fusion for accurate, scalable salient object detection.

Abstract

Multiscale convolutional neural network (CNN) has demonstrated remarkable capabilities in solving various vision problems. However, fusing features of different scales alwaysresults in large model sizes, impeding the application of multiscale CNNs in RGB-D saliency detection. In this paper, we propose a customized feature fusion module, called Saliency Enhanced Feature Fusion (SEFF), for RGB-D saliency detection. SEFF utilizes saliency maps of the neighboring scales to enhance the necessary features for fusing, resulting in more representative fused features. Our multiscale RGB-D saliency detector uses SEFF and processes images with three different scales. SEFF is used to fuse the features of RGB and depth images, as well as the features of decoders at different scales. Extensive experiments on five benchmark datasets have demonstrated the superiority of our method over ten SOTA saliency detectors.

A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

TL;DR

, and integrates this into a three-scale detector SEFFSal built on FasterNet backbones with a CPR-based decoder. Training leverages an adaptive pixel intensity loss

, combining BCE, IoU, and L1 terms, with a total loss

, achieving robust supervision across multiple scales. Experiments on five benchmarks show consistent improvements over ten SOTA RGB-D SOD methods, validating the effectiveness and practicality of saliency-guided cross-scale fusion for accurate, scalable salient object detection.

Abstract

Paper Structure (11 sections, 7 equations, 3 figures, 1 table)

This paper contains 11 sections, 7 equations, 3 figures, 1 table.

Introduction
Methodology
Overview
Saliency enhanced feature fusion module
SEFF-based multiscale RGB-D saliency detection
Implementation details
Experiment
Setup
Results and Analysis
Ablation study
Conclusion

Figures (3)

Figure 1: The framework of our multiscale RGB-D saliency detector.
Figure 2: The detailed structure of the proposed SEFF.
Figure 3: Some typical results of different RGB-D SOD methods on various scenes.

A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

TL;DR

Abstract

A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

Authors

TL;DR

Abstract

Table of Contents

Figures (3)