Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection

Kang Yi; Haoran Tang; Yumeng Li; Jing Xu; Jun Zhang

Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection

Kang Yi, Haoran Tang, Yumeng Li, Jing Xu, Jun Zhang

TL;DR

This paper proposes the GL-DMNet, a novel dual mutual learning network with global-local awareness that combines a position mutual fusion module and a channel mutual fusion module to exploit the interdependencies among different modalities in spatial and channel dimensions.

Abstract

RGB-D salient object detection (SOD), aiming to highlight prominent regions of a given scene by jointly modeling RGB and depth information, is one of the challenging pixel-level prediction tasks. Recently, the dual-attention mechanism has been devoted to this area due to its ability to strengthen the detection process. However, most existing methods directly fuse attentional cross-modality features under a manual-mandatory fusion paradigm without considering the inherent discrepancy between the RGB and depth, which may lead to a reduction in performance. Moreover, the long-range dependencies derived from global and local information make it difficult to leverage a unified efficient fusion strategy. Hence, in this paper, we propose the GL-DMNet, a novel dual mutual learning network with global-local awareness. Specifically, we present a position mutual fusion module and a channel mutual fusion module to exploit the interdependencies among different modalities in spatial and channel dimensions. Besides, we adopt an efficient decoder based on cascade transformer-infused reconstruction to integrate multi-level fusion features jointly. Extensive experiments on six benchmark datasets demonstrate that our proposed GL-DMNet performs better than 24 RGB-D SOD methods, achieving an average improvement of ~3% across four evaluation metrics compared to the second-best model (S3Net). Codes and results are available at https://github.com/kingkung2016/GL-DMNet.

Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection

TL;DR

Abstract

Paper Structure (19 sections, 17 equations, 9 figures, 5 tables)

This paper contains 19 sections, 17 equations, 9 figures, 5 tables.

Introduction
Related work
RGB-D salient object detection
Vision Transformer
Attention mechanism
Methodology
Overview
Dual mutual learning module
Position mutual fusion module
Channel mutual fusion module
Cascade Transformer-Infused Reconstruction Decoder
Loss function
Experiments
Datasets and evaluation metrics
Implementation details
...and 4 more sections

Figures (9)

Figure 1: The results of our GL-DMNet and other representative methods, including CATNet CATNet, HiDANet wu2023hidanet and TriTransNet liu2021tritransnet.
Figure 2: Comparison between (a) FPN framework, (b) dense decode network, (c) group transformer network, (d) visual transformer FPN, (e) triplet transformer embedding network, and our (f) transformer-infused reconstruction network.
Figure 3: Detailed framework of the proposed GL-DMNet. We adopt the ResNet-50 network to extract features of RGB and depth inputs, respectively. Then, position mutual fusion (PMF) and channel mutual fusion (CMF) are proposed to fuse the multi-modal features. The fused features of all stages are decoded by the cascade transformer-infused reconstruction network. The saliency head fan2021rethinking is also added to generate the final predicted feature maps.
Figure 4: The details of position mutual fusion (PMF) module and channel mutual fusion (CMF) module.
Figure 5: Visual comparisons of the proposed GL-DMNet and other state-of-the-art RGB-D SOD methods, including MIRV li2024mutual, HINet bi2023cross, DLMNet yang2022depth, CCAFNet zhou2022ccafnet, DENet xu2022weakly, MoADNet jin2022moadnet, MMNet gao2022unified, CMINet yi2022cross and DCF sun2021deep. Our approach obtains competitive performance in a variety of challenging scenarios.
...and 4 more figures

Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection

TL;DR

Abstract

Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)