Table of Contents
Fetching ...

Online Diffusion-Based 3D Occupancy Prediction at the Frontier with Probabilistic Map Reconciliation

Alec Reed, Lorin Achey, Brendan Crowe, Bradley Hayes, Christoffer Heckman

TL;DR

This work presents implementation details and results for real-time, online occupancy prediction using a modified diffusion model, and introduces a probabilistic update method for merging predicted occupancy data into running occupancy maps, resulting in an improvement in predicting occupancy at map frontiers compared to previous methods.

Abstract

Autonomous navigation and exploration in unmapped environments remains a significant challenge in robotics due to the difficulty robots face in making commonsense inference of unobserved geometries. Recent advancements have demonstrated that generative modeling techniques, particularly diffusion models, can enable systems to infer these geometries from partial observation. In this work, we present implementation details and results for real-time, online occupancy prediction using a modified diffusion model. By removing attention-based visual conditioning and visual feature extraction components, we achieve a 73$\%$ reduction in runtime with minimal accuracy reduction. These modifications enable occupancy prediction across the entire map, rather than being limited to the area around the robot where camera data can be collected. We introduce a probabilistic update method for merging predicted occupancy data into running occupancy maps, resulting in a 71$\%$ improvement in predicting occupancy at map frontiers compared to previous methods. Finally, we release our code and a ROS node for on-robot operation <upon publication> at github.com/arpg/sceneSense_ws.

Online Diffusion-Based 3D Occupancy Prediction at the Frontier with Probabilistic Map Reconciliation

TL;DR

This work presents implementation details and results for real-time, online occupancy prediction using a modified diffusion model, and introduces a probabilistic update method for merging predicted occupancy data into running occupancy maps, resulting in an improvement in predicting occupancy at map frontiers compared to previous methods.

Abstract

Autonomous navigation and exploration in unmapped environments remains a significant challenge in robotics due to the difficulty robots face in making commonsense inference of unobserved geometries. Recent advancements have demonstrated that generative modeling techniques, particularly diffusion models, can enable systems to infer these geometries from partial observation. In this work, we present implementation details and results for real-time, online occupancy prediction using a modified diffusion model. By removing attention-based visual conditioning and visual feature extraction components, we achieve a 73 reduction in runtime with minimal accuracy reduction. These modifications enable occupancy prediction across the entire map, rather than being limited to the area around the robot where camera data can be collected. We introduce a probabilistic update method for merging predicted occupancy data into running occupancy maps, resulting in a 71 improvement in predicting occupancy at map frontiers compared to previous methods. Finally, we release our code and a ROS node for on-robot operation <upon publication> at github.com/arpg/sceneSense_ws.
Paper Structure (12 sections, 2 equations, 6 figures, 2 tables)

This paper contains 12 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Onboard Occupancy Prediction and Map Merging: Green voxels represent observed occupancy and red voxels represent predicted occupancy. Gray graph points represent vertices and yellow graph points represent vertices identified as frontier points. (a) Spot platform is positioned in front of a t-intersection at startup as shown in the photo of the scene. (b) The map is populated with the observed 3D occupancy data from the LIDAR sensor. (c) Robot-centric (RC) occupancy prediction runs to predict occupancy data around the robot. Then a graph is built over the space to identify frontiers of interest for frontier-centric (FC) occupancy prediction. (d) Finally the diffusion model predicts the occupancy around the frontier points. These predicted maps are merged into the running map using our probabilistic map update rule.
  • Figure 2: System Block Diagram: Block diagram showing the system design for onboard SceneSense occupancy prediction. The system is comprised of an IMU and LIDAR sensor to generate odometry and occupancy maps. Once the occupancy map is built, a graph is constructed to evaluate frontier points for occupancy prediction. Local occupancy is then subselected around these points and sent to the SceneSense framework that provides occupancy predictions. These predictions are then merged with the running occupancy map using the probabilistic update rule.
  • Figure 3: Map Merging Example: Process for generating and merging occupancy predictions with an observed map. A graph is generated and evaluated to identify frontier nodes. Then, the frontier nodes are sorted by exploration gain as defined in Eq. \ref{['eq:GB_planner']}, and $d_m$ (min node spacing) and $n_{max}$ (max frontier prediction nodes) are enforced on the frontier points set. For each frontier point identified for occupancy prediction, local occupancy is selected from the observed map and sent to SceneSense for occupancy prediction. Finally, the predicted maps are merged into the running occupancy map using Eq. \ref{['eq:prob_map_merging']}.
  • Figure 4: Multi-Prediction Occupancy Merging: SceneSense predicts various occupancy maps based on equivalent input data that form a distribution. This distribution forms a curve where more likely predictions occur more often, and less likely predictions occur infrequently. These predictions are fused into the merged map using Eq. \ref{['eq:prob_map_merging']}. The resulting merged map naturally filters out the unlikely voxel predictions, forming an extended occupancy map.
  • Figure 5: Example Occupancy Predictions: Scene images at the top of the figure correspond to the 3 pairs of occupancy maps, where (a) corresponds to the top pair of occupancy maps. The left column of occupancy maps shows the vision only map, while the the right column shows the merged vision and prediction maps. (a) Spot approaches a hallway corner and given the LIDAR mounting position cannot observe the floor after entering the hall junction. SceneSense is able to fill the floor as we well as missing wall information that was not observed. (b) Spot navigates down a hallway and enters an area with a glass railing above the stairs. SceneSense does not fill the open space, where algorithms like hole filling or normal ground expansion may fail. (c). Spot navigates down a hallway generating predictions along the way. Spot's trajectory is shown in purple, and the identified frontier point is shown in yellow. Beyond providing predictions for the areas that have already been observed, SceneSense generates a frontier prediction at the 4-way intersection. This prediction shows the left side to be a dead-end, while a hallway or entryway is predicted on the right. In reality, these halls are really classrooms, where doors may be open or closed to allow for robot traversal.
  • ...and 1 more figures