Table of Contents
Fetching ...

MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Wang Zhao, Jiachen Liu, Sheng Zhang, Yishu Li, Sili Chen, Sharon X Huang, Yong-Jin Liu, Hengkai Guo

TL;DR

This paper first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild.

Abstract

This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. Unlike previous robust estimator-based works (which require multiple images or RGB-D input) and learning-based works (which suffer from domain shift), MonoPlane combines the best of two worlds and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild. Specifically, we first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance. We exploit effective 3D point proximity and model such proximity via a graph within RANSAC to guide the plane fitting from noisy monocular depths, followed by image-level multi-plane joint optimization to improve the consistency among all plane instances. We further design a simple but effective pipeline to extend this single-view solution to sparse-view 3D plane reconstruction. Extensive experiments on a list of datasets demonstrate our superior zero-shot generalizability over baselines, achieving state-of-the-art plane reconstruction performance in a transferring setting. Our code is available at https://github.com/thuzhaowang/MonoPlane .

MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

TL;DR

This paper first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild.

Abstract

This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. Unlike previous robust estimator-based works (which require multiple images or RGB-D input) and learning-based works (which suffer from domain shift), MonoPlane combines the best of two worlds and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild. Specifically, we first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance. We exploit effective 3D point proximity and model such proximity via a graph within RANSAC to guide the plane fitting from noisy monocular depths, followed by image-level multi-plane joint optimization to improve the consistency among all plane instances. We further design a simple but effective pipeline to extend this single-view solution to sparse-view 3D plane reconstruction. Extensive experiments on a list of datasets demonstrate our superior zero-shot generalizability over baselines, achieving state-of-the-art plane reconstruction performance in a transferring setting. Our code is available at https://github.com/thuzhaowang/MonoPlane .

Paper Structure

This paper contains 32 sections, 11 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Sample results from diverse scenarios. Our method achieves superior generalizable 3D plane detection and reconstruction by incorporating generalizable cues and robust designs.
  • Figure 2: Pipeline of our proposed method MonoPlane. Given a single image as input, a pre-trained monocular network predicts depths and normals for each image, resulting in 3D oriented point clouds. Our proposed proximity based graph-cut RANSAC is then applied to handle noisy point clouds and sequentially segment 3D planes, followed by projection and dense CRFs to output plane masks and parameters.
  • Figure 3: Example results of different RANSAC baselines and our method.
  • Figure 4: Extension to sparse views. We first utilize our single-view pipeline to get 3D plane proposals for each image. These plane proposals are then matched and fused into global 3D planes.
  • Figure 5: Qualitative results for single-view plane segmentation. We show two samples for ScanNet dai2017scannet (row 1&2), Matterport3D chang2017matterport3d (row 3&4), and Synthia ros2016synthia (row 5&6), respectively, for all methods.
  • ...and 8 more figures