Table of Contents
Fetching ...

Geometry-Aware Instance Segmentation with Disparity Maps

Cho-Ying Wu, Xiaoyan Hu, Michael Happold, Qiangeng Xu, Ulrich Neumann

TL;DR

This work addresses outdoor instance segmentation by integrating stereo disparity with RGB information through GAIS-Net, enabling geometry-aware mask regression across 2D, 2.5D, and 3D ROI representations. By back-projecting disparities into 3D and employing a PointNet-based 3D mask pipeline alongside image-based 2.5D masks, the method leverages geometric priors to improve segmentation in occlusions and suppress false positives. The framework includes a mask continuity loss, a self-supervised representation correspondence loss, and a MaskIoU-driven fusion scheme that intelligently combines predictions from multiple representations. The authors introduce the HQDS dataset with a longer baseline and higher resolution, achieving state-of-the-art results on HQDS and competitive gains on Cityscapes, highlighting the practical value of stereo geometry for autonomous driving applications.

Abstract

Most previous works of outdoor instance segmentation for images only use color information. We explore a novel direction of sensor fusion to exploit stereo cameras. Geometric information from disparities helps separate overlapping objects of the same or different classes. Moreover, geometric information penalizes region proposals with unlikely 3D shapes thus suppressing false positive detections. Mask regression is based on 2D, 2.5D, and 3D ROI using the pseudo-lidar and image-based representations. These mask predictions are fused by a mask scoring process. However, public datasets only adopt stereo systems with shorter baseline and focal legnth, which limit measuring ranges of stereo cameras. We collect and utilize High-Quality Driving Stereo (HQDS) dataset, using much longer baseline and focal length with higher resolution. Our performance attains state of the art. Please refer to our project page. The full paper is available here.

Geometry-Aware Instance Segmentation with Disparity Maps

TL;DR

This work addresses outdoor instance segmentation by integrating stereo disparity with RGB information through GAIS-Net, enabling geometry-aware mask regression across 2D, 2.5D, and 3D ROI representations. By back-projecting disparities into 3D and employing a PointNet-based 3D mask pipeline alongside image-based 2.5D masks, the method leverages geometric priors to improve segmentation in occlusions and suppress false positives. The framework includes a mask continuity loss, a self-supervised representation correspondence loss, and a MaskIoU-driven fusion scheme that intelligently combines predictions from multiple representations. The authors introduce the HQDS dataset with a longer baseline and higher resolution, achieving state-of-the-art results on HQDS and competitive gains on Cityscapes, highlighting the practical value of stereo geometry for autonomous driving applications.

Abstract

Most previous works of outdoor instance segmentation for images only use color information. We explore a novel direction of sensor fusion to exploit stereo cameras. Geometric information from disparities helps separate overlapping objects of the same or different classes. Moreover, geometric information penalizes region proposals with unlikely 3D shapes thus suppressing false positive detections. Mask regression is based on 2D, 2.5D, and 3D ROI using the pseudo-lidar and image-based representations. These mask predictions are fused by a mask scoring process. However, public datasets only adopt stereo systems with shorter baseline and focal legnth, which limit measuring ranges of stereo cameras. We collect and utilize High-Quality Driving Stereo (HQDS) dataset, using much longer baseline and focal length with higher resolution. Our performance attains state of the art. Please refer to our project page. The full paper is available here.

Paper Structure

This paper contains 9 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: GAIS-Net results on HQDS dataset. Left column shows stereo left images with histogram equalization to enhance contrast for better visualization. Middle and right column show Mask-RCNN and GAIS-Net results, respectively. Each instance has different colors. With the aid of geometric information, GAIS-Net can segment out the person from the overlapping area in the first row example. In the second row scenario, Mask-RCNN generates distorted mask for the smoking motorcyclist because of cigarette plume and in contrast GAIS-Net displays a more robust shape control capability.
  • Figure 2: Network design of our GAIS-Net. Bbox is for bounding box. We color modules in blue and outputs or loss parts in orange. In the MaskIoU module, the 2D features and 2D predicted mask are from the 2D mask head. They are fed into MaskIoU head to regress MaskIoU scores. We draw the MaskIoU head separately for clear visualization. $\oplus$ stands for concatenation.
  • Figure 3: Undesirable sampling example. The blue areas represent foreground. Suppose we uniformly sample every grid center point in the left figure, resulting in the point cloud showing in the occupancy grid on the right. Red crosses are undesirable sampling points, which just lie outside the foreground object, making the shape after sampling different from the original one.
  • Figure 4: Inference time mask fusion of predictions from different representations. We fuse the 2.5D mask and 3D mask first because they are from the same source. We then fuse the mask predictions from the image domain and disparity. $\oplus$ represents concatenation.