Table of Contents
Fetching ...

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

Jialun Pei, Tao Jiang, He Tang, Nian Liu, Yueming Jin, Deng-Ping Fan, Pheng-Ann Heng

TL;DR

CalibNet advances RGB-D salient instance segmentation by introducing a dual-branch cross-modal calibration framework that tightly fuses depth and RGB information in both the kernel and mask branches. The Dynamic Interactive Kernel and Weight-Sharing Fusion modules, together with a Depth Similarity Assessment, enable instance-aware kernel generation and robust mask feature calibration, all trained with bipartite matching. The paper also contributes the DSIS dataset, providing a higher-quality, multi-category RGB-D SIS benchmark for generalization studies. Empirical results show state-of-the-art performance on COME15K and DSIS across multiple setups, with real-time inference and strong robustness to depth quality variations, highlighting the practical impact of cross-modal calibration in multi-modal segmentation tasks.

Abstract

We propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set, which significantly surpasses the alternative frameworks. Our code and dataset are available at: https://github.com/PJLallen/CalibNet.

CalibNet: Dual-branch Cross-modal Calibration for RGB-D Salient Instance Segmentation

TL;DR

CalibNet advances RGB-D salient instance segmentation by introducing a dual-branch cross-modal calibration framework that tightly fuses depth and RGB information in both the kernel and mask branches. The Dynamic Interactive Kernel and Weight-Sharing Fusion modules, together with a Depth Similarity Assessment, enable instance-aware kernel generation and robust mask feature calibration, all trained with bipartite matching. The paper also contributes the DSIS dataset, providing a higher-quality, multi-category RGB-D SIS benchmark for generalization studies. Empirical results show state-of-the-art performance on COME15K and DSIS across multiple setups, with real-time inference and strong robustness to depth quality variations, highlighting the practical impact of cross-modal calibration in multi-modal segmentation tasks.

Abstract

We propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set, which significantly surpasses the alternative frameworks. Our code and dataset are available at: https://github.com/PJLallen/CalibNet.
Paper Structure (27 sections, 8 equations, 13 figures, 12 tables)

This paper contains 27 sections, 8 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: Illustration of the RGB-D salient instance segmentation task with the proposed CalibNet predictions. Our method propels RGB-D saliency detection to instance-level identification.
  • Figure 2: Comparison of two kinds of fusion architectures for RGB-D instance-level segmentation. (a) Proposal fusion in a two-stage manner xu2020outdoor; (b) Our dual-branch fusion in a one-stage manner.
  • Figure 3: Example of a diverse annotation of the proposed DSIS dataset.
  • Figure 4: Distribution of the DSIS dataset. Left: Distribution of image sources collected from RGB-D SOD datasets. Right: Distribution of the number of salient instances in each sample.
  • Figure 5: Comparison between the proposed DSIS and existing datasets for RGB-D SIS task. (a) Distribution of instance sizes in all test sets; (b) Comparison of the consistency between salient object-level ground truth and binarized instance-level ground truth; (c) Consistency of the salient object ground truth with the binarized depth map.
  • ...and 8 more figures