Table of Contents
Fetching ...

A Unified Structure for Efficient RGB and RGB-D Salient Object Detection

Peng Peng, Yong-Jie Li

TL;DR

A unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently and outperforms other state-of-the-art methods in both RGB and RGB-D SOD tasks on various datasets and in terms of most metrics.

Abstract

Salient object detection (SOD) has been well studied in recent years, especially using deep neural networks. However, SOD with RGB and RGB-D images is usually treated as two different tasks with different network structures that need to be designed specifically. In this paper, we proposed a unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently. The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs. The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries. The proposed structure is simple yet effective; the rich context information of RGB and depth can be appropriately extracted and fused by the proposed structure efficiently. Experimental results show that our method outperforms other state-of-the-art methods in both RGB and RGB-D SOD tasks on various datasets and in terms of most metrics.

A Unified Structure for Efficient RGB and RGB-D Salient Object Detection

TL;DR

A unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently and outperforms other state-of-the-art methods in both RGB and RGB-D SOD tasks on various datasets and in terms of most metrics.

Abstract

Salient object detection (SOD) has been well studied in recent years, especially using deep neural networks. However, SOD with RGB and RGB-D images is usually treated as two different tasks with different network structures that need to be designed specifically. In this paper, we proposed a unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently. The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs. The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries. The proposed structure is simple yet effective; the rich context information of RGB and depth can be appropriately extracted and fused by the proposed structure efficiently. Experimental results show that our method outperforms other state-of-the-art methods in both RGB and RGB-D SOD tasks on various datasets and in terms of most metrics.

Paper Structure

This paper contains 29 sections, 19 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of the proposed structure for RGB image (with the blue arrows) and RGB-D image SOD (with all blue and orange arrows). The proposed cross-attention context extraction (CRACE) module can extract and fuse cross-level features and even cross-modal features.
  • Figure 2: Cross-attention context extraction (CRACE) module. The feature blocks of $f_i$ ($f_l$) and $F_{i-1}$ ($f_g$) (and depth feature $d_i$) are first fused in the cross-attention block and then optimized in the channel attention block; more information is extracted in the multi-scale block and attentive fusion block. The output of the CRACE module ($F_i$) then serves as one of the inputs of the next-level CRACE module.
  • Figure 3: The architecture of the proposed unified structure dealing with the task of RGB image SOD (the part indicated by blue arrows in Fig.\ref{['fig_unified']}). Multi-level saliency and boundary supervisions are also adopted in the training stage. For the task of RGB-D image SOD, the depth map is processed by the backbone network and then the multi-level features are fed into the CRACE modules following the orange arrows in Fig.\ref{['fig_unified']}. Please see the text for more details.
  • Figure 4: Visual comparisons of the proposed method and state-of-the-art methods. Our method obtains more accurate results than others in both tasks of RGB and RGBD image SOD.
  • Figure 5: Comparison of precision--recall curves. Our method obtains the best performance on four datasets and the second best performance on DUT-OMRON.
  • ...and 2 more figures