Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning
Sicong Pan, Liren Jin, Xuying Huang, Cyrill Stachniss, Marija Popović, Maren Bennewitz
TL;DR
The paper addresses how to plan informative RGB views for reconstructing an unknown object from a single image. It introduces a pipeline that uses a 3D diffusion model to generate a mesh as a geometric prior, formulates an object-specific one-shot view planning problem as a set covering optimization with multi-view and distance constraints, and computes a globally shortest path to collect RGB views for NeRF reconstruction. Key contributions include (i) leveraging diffusion priors to enable RGB-based one-shot planning, (ii) a customized set covering formulation with alpha-view coverage and beta-distance constraints, and (iii) comprehensive simulations and real-world experiments showing improved reconstruction quality and reduced movement cost compared to baselines. The approach demonstrates the practical feasibility of diffusion-informed planning in robotics and provides open-source code to foster reproducibility and further research.
Abstract
Object reconstruction is relevant for many autonomous robotic tasks that require interaction with the environment. A key challenge in such scenarios is planning view configurations to collect informative measurements for reconstructing an initially unknown object. One-shot view planning enables efficient data collection by predicting view configurations and planning the globally shortest path connecting all views at once. However, prior knowledge about the object is required to conduct one-shot view planning. In this work, we propose a novel one-shot view planning approach that utilizes the powerful 3D generation capabilities of diffusion models as priors. By incorporating such geometric priors into our pipeline, we achieve effective one-shot view planning starting with only a single RGB image of the object to be reconstructed. Our planning experiments in simulation and real-world setups indicate that our approach balances well between object reconstruction quality and movement cost.
