Table of Contents
Fetching ...

Into the Unknown: Towards using Generative Models for Sampling Priors of Environment Uncertainty for Planning in Configuration Spaces

Subhransu S. Bhattacharjee, Hao Lu, Dylan Campbell, Rahul Shome

TL;DR

The paper tackles planning under partial observability by introducing a generative-prior pipeline that samples 3D environment representations conditioned on partial observations. It formalizes spatio-semantic priors and environment samplers, then implements a staged pipeline (VLM prompting, image-based generation, depth estimation, and 3D back-projection) to produce RGB-D point clouds with occupancy and target semantics for SE$(2)$ planning. Across 10 doorway-occluded Matterport3D scenes, the approach yields diverse, semantically plausible samples that enable a robust planner to maximize task success probability, with constrained prompting providing the strongest semantic recovery. While promising, the work acknowledges biases in pretrained models and runtime costs, outlining future work in real-robot deployment and active perception to broaden applicability.

Abstract

Priors are vital for planning under partial observability, yet difficult to obtain in practice. We present a sampling-based pipeline that leverages large-scale pretrained generative models to produce probabilistic priors capturing environmental uncertainty and spatio-semantic relationships in a zero-shot manner. Conditioned on partial observations, the pipeline recovers complete RGB-D point cloud samples with occupancy and target semantics, formulated to be directly useful in configuration-space planning. We establish a Matterport3D benchmark of rooms partially visible through doorways, where a robot must navigate to an unobserved target object. Effective priors for this setting must represent both occupancy and target-location uncertainty in unobserved regions. Experiments show that our approach recovers commonsense spatial semantics consistent with ground truth, yielding diverse, clean 3D point clouds usable in motion planning, highlight the promise of generative models as a rich source of priors for robotic planning.

Into the Unknown: Towards using Generative Models for Sampling Priors of Environment Uncertainty for Planning in Configuration Spaces

TL;DR

The paper tackles planning under partial observability by introducing a generative-prior pipeline that samples 3D environment representations conditioned on partial observations. It formalizes spatio-semantic priors and environment samplers, then implements a staged pipeline (VLM prompting, image-based generation, depth estimation, and 3D back-projection) to produce RGB-D point clouds with occupancy and target semantics for SE planning. Across 10 doorway-occluded Matterport3D scenes, the approach yields diverse, semantically plausible samples that enable a robust planner to maximize task success probability, with constrained prompting providing the strongest semantic recovery. While promising, the work acknowledges biases in pretrained models and runtime costs, outlining future work in real-robot deployment and active perception to broaden applicability.

Abstract

Priors are vital for planning under partial observability, yet difficult to obtain in practice. We present a sampling-based pipeline that leverages large-scale pretrained generative models to produce probabilistic priors capturing environmental uncertainty and spatio-semantic relationships in a zero-shot manner. Conditioned on partial observations, the pipeline recovers complete RGB-D point cloud samples with occupancy and target semantics, formulated to be directly useful in configuration-space planning. We establish a Matterport3D benchmark of rooms partially visible through doorways, where a robot must navigate to an unobserved target object. Effective priors for this setting must represent both occupancy and target-location uncertainty in unobserved regions. Experiments show that our approach recovers commonsense spatial semantics consistent with ground truth, yielding diverse, clean 3D point clouds usable in motion planning, highlight the promise of generative models as a rich source of priors for robotic planning.

Paper Structure

This paper contains 14 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: In the top row, the two images represent partial views for the office (left) and bedroom (right). Shown alongside are two simulated motions from an uncertainty-aware planner using priors generated from our pipeline. The three sections from top to bottom show intermediate outputs from the proposed pipeline are the expanded RGB images, monocular depth for the RGB images, and expanded point cloud samples. Each row shows three samples per row and is for one scene at a time ordered as office in one row, then bedroom.
  • Figure 2: A generative model pipeline is presented which provides structured priors in 3D to reason and plan beyond the FoV and uncover the occluded part of the scene. Samples are developed in 2D along with segmentation maps and depth maps which are back projected onto 3D and provided as inputs to the planner.
  • Figure 3: Motivating examples from Matterport where large portions of rooms are occluded or visible only through doorways. Crops are shown in bright yellow.