Table of Contents
Fetching ...

Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection

Jiaying Lin, Yuen-Hei Yeung, Shuquan Ye, Rynson W. H. Lau

TL;DR

This work tackles glass surface detection by leveraging RGB-D data to address the shortcomings of RGB-only methods in scenes with ambiguous boundaries. It introduces a large-scale RGB-D GSD dataset and a cross-modal framework that combines a Cross-Modal Context Mining (CCM) module with a Depth-Missing Aware Attention (DAA) module to exploit depth context and missing-depth cues. Empirical results show the proposed approach outperforms state-of-the-art RGB-based methods and RGB-D baselines on multiple datasets, highlighting the value of depth information for robust glass detection. The approach has practical significance for autonomous systems, enabling safer navigation in real-world environments where depth gaps around glass surfaces provide informative cues.

Abstract

Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This, however, poses substantial challenges to the operations of autonomous systems such as robots, self-driving cars, and drones, as these glass panels can become transparent obstacles to navigation. Existing works attempt to exploit various cues, including glass boundary context or reflections, as priors. However, they are all based on input RGB images. We observe that the transmission of 3D depth sensor light through glass surfaces often produces blank regions in the depth maps, which can offer additional insights to complement the RGB image features for glass surface detection. In this work, we first propose a large-scale RGB-D glass surface detection dataset, \textit{RGB-D GSD}, for rigorous experiments and future research. It contains 3,009 images, paired with precise annotations, offering a wide range of real-world RGB-D glass surface categories. We then propose a novel glass surface detection framework combining RGB and depth information, with two novel modules: a cross-modal context mining (CCM) module to adaptively learn individual and mutual context features from RGB and depth information, and a depth-missing aware attention (DAA) module to explicitly exploit spatial locations where missing depths occur to help detect the presence of glass surfaces. Experimental results show that our proposed model outperforms state-of-the-art methods.

Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection

TL;DR

This work tackles glass surface detection by leveraging RGB-D data to address the shortcomings of RGB-only methods in scenes with ambiguous boundaries. It introduces a large-scale RGB-D GSD dataset and a cross-modal framework that combines a Cross-Modal Context Mining (CCM) module with a Depth-Missing Aware Attention (DAA) module to exploit depth context and missing-depth cues. Empirical results show the proposed approach outperforms state-of-the-art RGB-based methods and RGB-D baselines on multiple datasets, highlighting the value of depth information for robust glass detection. The approach has practical significance for autonomous systems, enabling safer navigation in real-world environments where depth gaps around glass surfaces provide informative cues.

Abstract

Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This, however, poses substantial challenges to the operations of autonomous systems such as robots, self-driving cars, and drones, as these glass panels can become transparent obstacles to navigation. Existing works attempt to exploit various cues, including glass boundary context or reflections, as priors. However, they are all based on input RGB images. We observe that the transmission of 3D depth sensor light through glass surfaces often produces blank regions in the depth maps, which can offer additional insights to complement the RGB image features for glass surface detection. In this work, we first propose a large-scale RGB-D glass surface detection dataset, \textit{RGB-D GSD}, for rigorous experiments and future research. It contains 3,009 images, paired with precise annotations, offering a wide range of real-world RGB-D glass surface categories. We then propose a novel glass surface detection framework combining RGB and depth information, with two novel modules: a cross-modal context mining (CCM) module to adaptively learn individual and mutual context features from RGB and depth information, and a depth-missing aware attention (DAA) module to explicitly exploit spatial locations where missing depths occur to help detect the presence of glass surfaces. Experimental results show that our proposed model outperforms state-of-the-art methods.
Paper Structure (10 sections, 4 equations, 9 figures, 3 tables)

This paper contains 10 sections, 4 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Advantages of detecting glass surfaces with RGB-D images. These examples show that the depth map can provide a strong cue for glass surface detection. State-of-the-art methods, GSDNet GSD:2021 and EBLNet He_2021_ICCV, relying only on input RGB images are not able to correctly separate the glass surfaces from the background. Through learning the cross-modal contexts and the correlation between depth-missing regions and glass surface regions, our proposed model can detect the glass surfaces accurately in all three challenging scenes. Note that red regions in the depth images represent missing depths.
  • Figure 2: Examples from our RGB-D GSD dataset. Top, middle and bottom rows show RGB images, depth maps, GT glass surface masks overlaid on the images, respectively.
  • Figure 3: Statistics of our proposed dataset.
  • Figure 4: The pipeline of our proposed framework.
  • Figure 5: Illustration of our proposed cross-modal context mining (CCM) module.
  • ...and 4 more figures