FlatLands: Generative Floormap Completion From a Single Egocentric View

Subhransu S. Bhattacharjee; Dylan Campbell; Rahul Shome

FlatLands: Generative Floormap Completion From a Single Egocentric View

Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome

Abstract

A single egocentric image typically captures only a small portion of the floor, yet a complete metric traversability map of the surroundings would better serve applications such as indoor navigation. We introduce FlatLands, a dataset and benchmark for single-view bird's-eye view (BEV) floor completion. The dataset contains 270,575 observations from 17,656 real metric indoor scenes drawn from six existing datasets, with aligned observation, visibility, validity, and ground-truth BEV maps, and the benchmark includes both in- and out-of-distribution evaluation protocols. We compare training-free approaches, deterministic models, ensembles, and stochastic generative models. Finally, we instantiate the task as an end-to-end monocular RGB-to-floormaps pipeline. FlatLands provides a rigorous testbed for uncertainty-aware indoor mapping and generative completion for embodied navigation.

FlatLands: Generative Floormap Completion From a Single Egocentric View

Abstract

Paper Structure (83 sections, 8 equations, 19 figures, 16 tables)

This paper contains 83 sections, 8 equations, 19 figures, 16 tables.

Introduction
Contributions.
Background and Related Work
Indoor map completion and occupancy anticipation.
Image inpainting and outpainting via generative models.
BEV scene understanding by posterior sampling.
Inference efficiency.
Problem and Framework
Task: Unobserved Floormap Completion
Estimating the Observed Floormap from an Egocentric View
Training Floormap Completion Models
FlatLands Dataset
Sources, scope, and canonical splits.
Data construction pipeline.
Experiments
...and 68 more sections

Figures (19)

Figure 1: Pipeline. From a single RGB image, our model predicts depth and floor segmentation and projects them to BEV, producing observed floor $F_{\text{obs}}$ and unobserved mask $U$. A conditional generator then predicts floormap completions in the unobserved region, while preserving observed evidence.
Figure 2: FlatLands dataset statistics and construction.
Figure 3: Input egocentric RGB (left) and the four aligned $256{\times}256$ binary maps per observation. $F_{\text{obs}}$: observed floor; $U$: valid unobserved; $F^{\star}$: full floor ground truth; $V$: valid workspace. The white marker ($\blacktriangledown$) denotes the fixed camera anchor in BEV.
Figure 4: LaMa-Ensemble vs. FM+XAttn on a multi-room ScanNet scene. Row 1: observed floor $F_{\text{obs}}$ and unobserved mask $U$ condition both models; the four LaMa-Ensemble samples (boxed) and their per-pixel variance $\sigma^2$. Row 2: ground-truth floor $F^{\star}$ and validity mask $V$ used for evaluation; four FM+XAttn samples (boxed) and their $\sigma^2$. LaMa-Ensemble spreads variance uniformly; FM+XAttn concentrates it at layout boundaries.
Figure 5: Qualitative results on the test split.Top: deterministic single-output comparison across three scenes (in-distribution rows 1--2, out-of-distribution row 3). Columns show the observed floor $F_{\text{obs}}$, unobserved mask $U$, ground truth, and predictions from each baseline. These BEV observations are geometrically projected from the 3D mesh and do not involve any RGB input. Bottom: four independent samples, drawn from each stochastic generator for one in-distribution scene, alongside the per-pixel variance $\sigma^2$ (brighter $=$ higher disagreement).
...and 14 more figures

FlatLands: Generative Floormap Completion From a Single Egocentric View

Abstract

FlatLands: Generative Floormap Completion From a Single Egocentric View

Authors

Abstract

Table of Contents

Figures (19)