Table of Contents
Fetching ...

DC-Net: Divide-and-Conquer for Salient Object Detection

Jiayi Zhu, Xuebin Qin, Abdulmotaleb Elsaddik

TL;DR

Based on the advantage of Divide-and-Conquer's parallel computing, Parallel Acceleration is used to speed up DC-Net, allowing it to achieve competitive performance on six LR-SOD and five HR-S OD datasets under high efficiency (60 FPS and 55 FPS).

Abstract

In this paper, we introduce Divide-and-Conquer into the salient object detection (SOD) task to enable the model to learn prior knowledge that is for predicting the saliency map. We design a novel network, Divide-and-Conquer Network (DC-Net) which uses two encoders to solve different subtasks that are conducive to predicting the final saliency map, here is to predict the edge maps with width 4 and location maps of salient objects and then aggregate the feature maps with different semantic information into the decoder to predict the final saliency map. The decoder of DC-Net consists of our newly designed two-level Residual nested-ASPP (ResASPP$^{2}$) modules, which have the ability to capture a large number of different scale features with a small number of convolution operations and have the advantages of maintaining high resolution all the time and being able to obtain a large and compact effective receptive field (ERF). Based on the advantage of Divide-and-Conquer's parallel computing, we use Parallel Acceleration to speed up DC-Net, allowing it to achieve competitive performance on six LR-SOD and five HR-SOD datasets under high efficiency (60 FPS and 55 FPS). Codes and results are available: https://github.com/PiggyJerry/DC-Net.

DC-Net: Divide-and-Conquer for Salient Object Detection

TL;DR

Based on the advantage of Divide-and-Conquer's parallel computing, Parallel Acceleration is used to speed up DC-Net, allowing it to achieve competitive performance on six LR-SOD and five HR-S OD datasets under high efficiency (60 FPS and 55 FPS).

Abstract

In this paper, we introduce Divide-and-Conquer into the salient object detection (SOD) task to enable the model to learn prior knowledge that is for predicting the saliency map. We design a novel network, Divide-and-Conquer Network (DC-Net) which uses two encoders to solve different subtasks that are conducive to predicting the final saliency map, here is to predict the edge maps with width 4 and location maps of salient objects and then aggregate the feature maps with different semantic information into the decoder to predict the final saliency map. The decoder of DC-Net consists of our newly designed two-level Residual nested-ASPP (ResASPP) modules, which have the ability to capture a large number of different scale features with a small number of convolution operations and have the advantages of maintaining high resolution all the time and being able to obtain a large and compact effective receptive field (ERF). Based on the advantage of Divide-and-Conquer's parallel computing, we use Parallel Acceleration to speed up DC-Net, allowing it to achieve competitive performance on six LR-SOD and five HR-SOD datasets under high efficiency (60 FPS and 55 FPS). Codes and results are available: https://github.com/PiggyJerry/DC-Net.
Paper Structure (21 sections, 3 equations, 13 figures, 8 tables)

This paper contains 21 sections, 3 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Comparison of FPS and performance of our DC-Net-R with other state-of-the-art SOD convolution-based methods. The $F_{\beta}^{w}$ measure is computed on dataset DUT-OMRON yang2013saliency. The red star denotes our DC-Net-R (Ours-R, 60 FPS) and the red dot line denotes the real-time (60 FPS) line.
  • Figure 2: Some examples of different auxiliary maps. (c) represents the location information of the salient object. The sum of (d) and (e) is equal to (b). (f)-(j) represents the edge pixels of salient objects with widths 1, 2, 3, 4, and 5 respectively.
  • Figure 3: Illustration of our proposed DC-Net architecture. DC-Net has two encoders and a decoder, we can consider these two encoders as one parallel encoder. Thus, the main architecture of DC-Net is a U-Net like Encoder-Decoder, where each stage of the decoder consists of our newly proposed two-level Residual nested-ASPP module (ResASPP$^{2}$).
  • Figure 4: Illustration of existing multi-scale feature fusion module and our proposed two-level Residual nested-ASPP module: (a) ASPP-like module, (b) PPM-like module, (c) RSU module and its extension RSFPN module, where $L$ is the number of layers in the encoder, (d) Our two-level Residual nested-ASPP module ResASPP$^{2}$.
  • Figure 6: Illustration of the parallel encoder and merged convolution. 'MM' means Matrix Multiplication. A convolution operation can be separated as three parts: an unfold operation, a matrix multiplication, and a fold operation.
  • ...and 8 more figures