Table of Contents
Fetching ...

Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans

Ludvig Ericson, Patric Jensfelt

TL;DR

This work addresses predicting unseen indoor walls from partial occupancy grids captured by a 360° LIDAR, framing the task as autoregressive prediction of wall-vertex sequences conditioned on sensor history and floor-plan priors. The proposed Floorist model uses a transformer encoder–decoder with cross-attention, encoding occupancy grids via ViT and visible-wall tokens via discrete embeddings, and generates wall segments as a sequence of tokens with a robust axis-aligned, subdivided representation. Data is synthesized by simulating robot trajectories on KTH floor plans, including axis alignment and segment subdivision, with evaluation via predicted information gain in frontier-based exploration and extensive ablations against a non-predictive baseline and an image-based predictor. The results show Floorist outperforms baselines in predicting information gain and wall geometry, scales with sensor range and grid area, and generalizes to a real-world office environment, highlighting potential for real-time floor-plan inference and improved exploration strategies. The work also provides open-source data and methodology to advance map-predictive exploration research.

Abstract

In this paper, we tackle the challenge of predicting the unseen walls of a partially observed environment as a set of 2D line segments, conditioned on occupancy grids integrated along the trajectory of a 360° LIDAR sensor. A dataset of such occupancy grids and their corresponding target wall segments is collected by navigating a virtual robot between a set of randomly sampled waypoints in a collection of office-scale floor plans from a university campus. The line segment prediction task is formulated as an autoregressive sequence prediction task, and an attention-based deep network is trained on the dataset. The sequence-based autoregressive formulation is evaluated through predicted information gain, as in frontier-based autonomous exploration, demonstrating significant improvements over both non-predictive estimation and convolution-based image prediction found in the literature. Ablations on key components are evaluated, as well as sensor range and the occupancy grid's metric area. Finally, model generality is validated by predicting walls in a novel floor plan reconstructed on-the-fly in a real-world office environment.

Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans

TL;DR

This work addresses predicting unseen indoor walls from partial occupancy grids captured by a 360° LIDAR, framing the task as autoregressive prediction of wall-vertex sequences conditioned on sensor history and floor-plan priors. The proposed Floorist model uses a transformer encoder–decoder with cross-attention, encoding occupancy grids via ViT and visible-wall tokens via discrete embeddings, and generates wall segments as a sequence of tokens with a robust axis-aligned, subdivided representation. Data is synthesized by simulating robot trajectories on KTH floor plans, including axis alignment and segment subdivision, with evaluation via predicted information gain in frontier-based exploration and extensive ablations against a non-predictive baseline and an image-based predictor. The results show Floorist outperforms baselines in predicting information gain and wall geometry, scales with sensor range and grid area, and generalizes to a real-world office environment, highlighting potential for real-time floor-plan inference and improved exploration strategies. The work also provides open-source data and methodology to advance map-predictive exploration research.

Abstract

In this paper, we tackle the challenge of predicting the unseen walls of a partially observed environment as a set of 2D line segments, conditioned on occupancy grids integrated along the trajectory of a 360° LIDAR sensor. A dataset of such occupancy grids and their corresponding target wall segments is collected by navigating a virtual robot between a set of randomly sampled waypoints in a collection of office-scale floor plans from a university campus. The line segment prediction task is formulated as an autoregressive sequence prediction task, and an attention-based deep network is trained on the dataset. The sequence-based autoregressive formulation is evaluated through predicted information gain, as in frontier-based autonomous exploration, demonstrating significant improvements over both non-predictive estimation and convolution-based image prediction found in the literature. Ablations on key components are evaluated, as well as sensor range and the occupancy grid's metric area. Finally, model generality is validated by predicting walls in a novel floor plan reconstructed on-the-fly in a real-world office environment.
Paper Structure (25 sections, 13 equations, 5 figures, 1 table)

This paper contains 25 sections, 13 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Three consecutive occupancy grids with Unknown, Free, Occu-pied, and Window cells; Predicted walls from a Floorist model; Target walls; and Trajectory. Initially (left), few lines match up exactly, apart from the northern exterior wall which is visible, and the predicted rooms do exist though not in the exact locations predicted. In the next step (middle), more information about the corridor is observed, and the predicted segments are also improved, though the adjoining rooms are still misaligned and their doorways misplaced. Finally (right), the adjacent rooms are partially observed, and the predicted doorway alignment is now correct, and both room's widths are correctly adjusted. The images have been cropped for legibility.
  • Figure 2: The Floorist network architecture with training and generation pathways. In training, source and target token sequences are samples from the dataset, where Encoder blocks and Decoder blocks are evaluated in one pass. In generation, the source sequence starts as $'start'{}$ and the encoder side is only evaluated once. The decoder is then evaluated to obtain the next-token distribution, and a token is sampled from it and appended to the source sequence before the process repeats again. $R=|\boldsymbol{t}(\boldsymbol{S}(\boldsymbol{M})){}|$ is the number of tokens in the line segments visible in $\boldsymbol{M}$, $T=|\boldsymbol{{\hat{t}}}|$ is the number of target tokens. $P$ is the number of patches from ViT, $E$ is the embedding dimension, $Q,K,V$ are the query, key, and value matrices for a multi-head attention (MHA) layer. $\boldsymbol{t}_{\left[i<T\right]}$ denotes removal of the last token (i.e., 'end''end'), and $\boldsymbol{t}_{\left[1<i\right]}$ denotes removal of the first token (i.e., 'start''start'). This shifts the two sequences so that the target is always the next token. Note that each attention block is a gated residual connection as in rezero.
  • Figure 3: Illustration of how information gain is computed for a Frontier location found along Trajectory. The initial occupancy grid $\boldsymbol{M}$ is shown in (a), while (b) to (e) show the occupancy grid $\boldsymbol{M}'$ after a simulated sensor scan in the predicted environment has been integrated into $\boldsymbol{M}$, with Information gain cells, Sensor scan, and Walls. Occupancy grid colors as in \ref{['fig:murder']}. In (b), walls are extracted from (a), corresponding to the typical way information gain is estimated for a frontier in non-predictive autonomous exploration bircher2016receding. In (c), walls are predicted by a model using (a) as input to a U-Net predictor as in tao2023seerkatyal2019uncertainty, and in (d) with the method proposed in this paper. In (e), the ground truth walls are used. In this example, the naive $\tilde{I}_n$, convolutional $\tilde{I}_d$, and our approach $\tilde{I}_f$ differ from the ground truth ${\hat{I}}$ by 1593;770;95 (relative difference [list-units = repeat]122;58.9;7.26) respectively.
  • Figure 4: Examples of predictions on random samples from the test set. Each row is a section of a single trajectory, in sequence from left to right. Frontier locations used in the information gain evaluation are also shown. Other conventions as in \ref{['fig:ig_example']}. Note that in (b), the predicted occupancy grid is shown with Occupied cells. It is advisable to use a digital document viewer to zoom the vector graphics.
  • Figure 5: Cumulative distribution function $F$ of absolute error $|d| = |\tilde{I} - {\hat{I}}|$ in predicted information gain $\tilde{I}$ from the true information gain ${\hat{I}}$ using line segments from Naive, U-Net, and Floorist. $N=1464140.0$.