CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

Jialun Pei; Tao Jiang; He Tang; Nian Liu; Yueming Jin; Deng-Ping Fan; Pheng-Ann Heng

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

Jialun Pei, Tao Jiang, He Tang, Nian Liu, Yueming Jin, Deng-Ping Fan, Pheng-Ann Heng

TL;DR

CalibNet advances RGB-D salient instance segmentation by introducing a dual-branch cross-modal calibration framework that tightly fuses depth and RGB information in both the kernel and mask branches. The Dynamic Interactive Kernel and Weight-Sharing Fusion modules, together with a Depth Similarity Assessment, enable instance-aware kernel generation and robust mask feature calibration, all trained with bipartite matching. The paper also contributes the DSIS dataset, providing a higher-quality, multi-category RGB-D SIS benchmark for generalization studies. Empirical results show state-of-the-art performance on COME15K and DSIS across multiple setups, with real-time inference and strong robustness to depth quality variations, highlighting the practical impact of cross-modal calibration in multi-modal segmentation tasks.

Abstract

We propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set, which significantly surpasses the alternative frameworks. Our code and dataset are available at: https://github.com/PJLallen/CalibNet.

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

TL;DR

Abstract

Paper Structure (27 sections, 8 equations, 13 figures, 12 tables)

This paper contains 27 sections, 8 equations, 13 figures, 12 tables.

Introduction
Related Work
RGB-D Salient Object Detection
Salient Instance Segmentation
Instance Segmentation with RGB-D Data
DSIS Dataset
Dataset Statistics
Dataset Analysis and Comparison
Center Bias
Instance Size Distribution
Objects/Instances Consistency
Depth/Saliency Consistency
Proposed CalibNet
Overall Architecture
Model Encoder
...and 12 more sections

Figures (13)

Figure 1: Illustration of the RGB-D salient instance segmentation task with the proposed CalibNet predictions. Our method propels RGB-D saliency detection to instance-level identification.
Figure 2: Comparison of two kinds of fusion architectures for RGB-D instance-level segmentation. (a) Proposal fusion in a two-stage manner xu2020outdoor; (b) Our dual-branch fusion in a one-stage manner.
Figure 3: Example of a diverse annotation of the proposed DSIS dataset.
Figure 4: Distribution of the DSIS dataset. Left: Distribution of image sources collected from RGB-D SOD datasets. Right: Distribution of the number of salient instances in each sample.
Figure 5: Comparison between the proposed DSIS and existing datasets for RGB-D SIS task. (a) Distribution of instance sizes in all test sets; (b) Comparison of the consistency between salient object-level ground truth and binarized instance-level ground truth; (c) Consistency of the salient object ground truth with the binarized depth map.
...and 8 more figures

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

TL;DR

Abstract

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (13)