Table of Contents
Fetching ...

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

John Lambert, Yuguang Li, Ivaylo Boyadzhiev, Lambert Wixson, Manjunath Narayana, Will Hutchcroft, James Hays, Frank Dellaert, Sing Bing Kang

TL;DR

This work proposes a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, the authors' novel pairwise learned alignment verifier, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy.

Abstract

We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. The inputs to our system are sparsely located 360$^\circ$ panoramas, whose semantic features (windows, doors, and openings) are inferred and used to hypothesize pairwise room adjacency or overlap. SALVe initializes a pose graph, which is subsequently optimized using GTSAM. Once the room poses are computed, room layouts are inferred using HorizonNet, and the floorplan is constructed by stitching the most confident layout boundaries. We validate our system qualitatively and quantitatively as well as through ablation studies, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy. Our results point to the significance of our work: poses of 81% of panoramas are localized in the first 2 connected components (CCs), and 89% in the first 3 CCs. Code and models are publicly available at https://github.com/zillow/salve.

SALVe: Semantic Alignment Verification for Floorplan Reconstruction from Sparse Panoramas

TL;DR

This work proposes a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, the authors' novel pairwise learned alignment verifier, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy.

Abstract

We propose a new system for automatic 2D floorplan reconstruction that is enabled by SALVe, our novel pairwise learned alignment verifier. The inputs to our system are sparsely located 360 panoramas, whose semantic features (windows, doors, and openings) are inferred and used to hypothesize pairwise room adjacency or overlap. SALVe initializes a pose graph, which is subsequently optimized using GTSAM. Once the room poses are computed, room layouts are inferred using HorizonNet, and the floorplan is constructed by stitching the most confident layout boundaries. We validate our system qualitatively and quantitatively as well as through ablation studies, showing that it outperforms state-of-the-art SfM systems in completeness by over 200%, without sacrificing accuracy. Our results point to the significance of our work: poses of 81% of panoramas are localized in the first 2 connected components (CCs), and 89% in the first 3 CCs. Code and models are publicly available at https://github.com/zillow/salve.
Paper Structure (40 sections, 5 equations, 17 figures, 9 tables, 1 algorithm)

This paper contains 40 sections, 5 equations, 17 figures, 9 tables, 1 algorithm.

Figures (17)

  • Figure 1: A challenging wide-baseline scenario where traditional SfM systems that rely upon keypoint feature matches struggle, but where we succeed by exploiting semantic features such as windows, doors, and openings, or W/D/O). We infer layout and hypothesize plausible pairwise relative poses, which are then accepted or rejected, by feeding top-down aligned renderings into our learned SALVe verifier. Our global pose estimation has high completeness, leading to dramatic improvements in floorplan reconstruction (indicated by colored regions) vs. state-of-the-art systems such as OpenMVG Moulon16iwrrpr_OpenMVG and OpenSfM Gargallo16github_OpenSfM. For this hallway/entryway pano pair, SALVe easily validates a relative pose that was generated by grounding on a hallway opening feature.
  • Figure 2: Overview of our floorplan reconstruction system. "BEV" = "bird's eye view". Blue boxes are processing components, gray boxes are data. Trapezoids denote components based on deep networks; lighter blue networks are trained by us. 'Image Room Layout' represents the image coordinates of the floor-wall boundary (at each panorama column). $n$ is the number of panoramas and $k$ is the average number of detected windows/doors/openings per panorama. We show rendered floor and ceiling texture maps for a consistently-aligned pair of panoramas.
  • Figure 3: Generating training samples. Orthographic BEVs of given panoramas, after semantic alignment proposal. Red arrows indicate the W/D/O, used to generate the pose proposals. Column 1: Example of extreme baseline pair. Column 2: overlaid floor (top) and ceiling (bottom). Column 3: Example of a wide baseline pair. Column 4: overlaid floor (top) and ceiling (bottom).
  • Figure 4: An example of different stages of floorplan reconstruction: Left: Estimated positions of panorama centers. Center: Grouped panoramas with estimated dense room layouts. Panorama centers with the same color are part of the same group. Notice that each open space is grouped together. Distinct groups correspond largely to physical rooms separated by doors. Right: The final floorplan after highest-confidence contour extraction is applied to each group. Each contour is filled with a unique color.
  • Figure 5: Precision-recall analysis of SALVe. Left: curve for SALVe under different inputs ('layout-only' refers to a model with access only to estimated room geometry, but no floor or ceiling texture). Center: Comparison of confidence thresholds versus their effect on precision and recall. The purple line indicates our operating point (93% confidence). Right: Classification accuracy vs. visual overlap for the GT positive class only from SE(2) alignments generated from predicted W/D/O's. Small visual overlap often corresponds to "extreme" baselines.
  • ...and 12 more figures