Table of Contents
Fetching ...

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

Jamie Watson, Filippo Aleotti, Mohamed Sayed, Zawar Qureshi, Oisin Mac Aodha, Gabriel Brostow, Michael Firman, Sara Vicente

TL;DR

The paper addresses the challenge of extracting 3D planar layouts from sequences of posed RGB images by introducing 3D-consistent plane embeddings. A per-scene MLP $oldsymbol{m{ o}}$ maps 3D points to embeddings $ m{e}_{m{p}} = oldsymbol{m{ o}}(m{p})$, trained online to align with per-image plane cues while maintaining cross-view consistency; geometry from a light 3D reconstruction (via SimpleRecon) provides a mesh enriched with planar probabilities, and a clustering step (RANSAC or mean-shift) groups embeddings and geometry into plane instances. The method yields state-of-the-art results on ScanNetV2, with strong ablations showing the embeddings improve both geometry and segmentation metrics, and online variants achieving interactive speeds suitable for AR/robotics. The work demonstrates that learning 3D semantic priors for planes, coupled with robust geometric priors, can outperform purely geometric baselines and flexible end-to-end systems, while maintaining real-time applicability. Overall, the approach advances plane-aware scene representations by encoding 3D-consistent semantics that facilitate robust plane decomposition in dynamic, multi-view settings.

Abstract

Extracting planes from a 3D scene is useful for downstream tasks in robotics and augmented reality. In this paper we tackle the problem of estimating the planar surfaces in a scene from posed images. Our first finding is that a surprisingly competitive baseline results from combining popular clustering algorithms with recent improvements in 3D geometry estimation. However, such purely geometric methods are understandably oblivious to plane semantics, which are crucial to discerning distinct planes. To overcome this limitation, we propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes. We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches and our strong geometric baseline for the task of plane estimation.

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

TL;DR

The paper addresses the challenge of extracting 3D planar layouts from sequences of posed RGB images by introducing 3D-consistent plane embeddings. A per-scene MLP maps 3D points to embeddings , trained online to align with per-image plane cues while maintaining cross-view consistency; geometry from a light 3D reconstruction (via SimpleRecon) provides a mesh enriched with planar probabilities, and a clustering step (RANSAC or mean-shift) groups embeddings and geometry into plane instances. The method yields state-of-the-art results on ScanNetV2, with strong ablations showing the embeddings improve both geometry and segmentation metrics, and online variants achieving interactive speeds suitable for AR/robotics. The work demonstrates that learning 3D semantic priors for planes, coupled with robust geometric priors, can outperform purely geometric baselines and flexible end-to-end systems, while maintaining real-time applicability. Overall, the approach advances plane-aware scene representations by encoding 3D-consistent semantics that facilitate robust plane decomposition in dynamic, multi-view settings.

Abstract

Extracting planes from a 3D scene is useful for downstream tasks in robotics and augmented reality. In this paper we tackle the problem of estimating the planar surfaces in a scene from posed images. Our first finding is that a surprisingly competitive baseline results from combining popular clustering algorithms with recent improvements in 3D geometry estimation. However, such purely geometric methods are understandably oblivious to plane semantics, which are crucial to discerning distinct planes. To overcome this limitation, we propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes. We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches and our strong geometric baseline for the task of plane estimation.
Paper Structure (25 sections, 1 equation, 7 figures, 3 tables)

This paper contains 25 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: We create planar scene representations using only posed RGB images as input. Existing systems can predict per-pixel planar embeddings for each image, but these are not 3D consistent. We learn a per-scene function which maps points on the same plane to nearby positions in an embedding space. Clustering these embeddings, using strong geometrical priors, gives accurate planar reconstructions.
  • Figure 2: Per-image planar embeddings are not temporally consistent. While they can segment planes within a single image, plane embeddings in (b) from yu2019single do not result in 3D consistent embeddings for a full scene. Our method (c) gives a per-scene embedding which is consistent across many views of that scene.
  • Figure 3: Our method for 3D plane estimation. For each RGB keyframe we estimate per-pixel depth, planar probability and planar embedding following yu2019single. We fuse the depths and planar probabilities into a TSDF and extract a mesh. We then train a per-scene MLP to distill the per-pixel embeddings into 3D-consistent embeddings. These are finally grouped via clustering into 3D planes.
  • Figure 4: Planes can be estimated online at interactive rates. As new RGB frames are acquired, we can update the weights of our MLP and recompute plane assignments. See Sec. \ref{['sec:timings']} for timings.
  • Figure 5: Sequential RANSAC alone is not enough to segment planar instances. Sequential RANSAC (with geometry from sayed2022simplerecon) does well, but fails to segment adjacent co-planar instances. Our method can segment these, e.g., this picture frame.
  • ...and 2 more figures