Active Implicit Reconstruction Using One-Shot View Planning
Hao Hu, Sicong Pan, Liren Jin, Marija Popović, Maren Bennewitz
TL;DR
This work tackles active object reconstruction under restricted resources by introducing OSVP, a one-shot view planning framework that leverages implicit surface representations. OSVP predicts a minimal set of viewpoints from POCO-refined dense surface points, bridging implicit priors with efficient planning and reducing movement and view counts. A new dataset generation approach labels the smallest view sets via set-covering on refined surfaces, and the OSVP architecture combines a PoinTr backbone with a ViewState Transformer trained through a set-covering objective. Experiments on synthetic and real-world data show that OSVP achieves comparable implicit reconstruction quality with fewer views and lower movement costs, illustrating practical gains for robot-assisted object modeling.
Abstract
Active object reconstruction using autonomous robots is gaining great interest. A primary goal in this task is to maximize the information of the object to be reconstructed, given limited on-board resources. Previous view planning methods exhibit inefficiency since they rely on an iterative paradigm based on explicit representations, consisting of (1) planning a path to the next-best view only; and (2) requiring a considerable number of less-gain views in terms of surface coverage. To address these limitations, we propose to integrate implicit representations into the One-Shot View Planning (OSVP). The key idea behind our approach is to use implicit representations to obtain the small missing surface areas instead of observing them with extra views. Therefore, we design a deep neural network, named OSVP, to directly predict a set of views given a dense point cloud refined from an initial sparse observation. To train our OSVP network, we generate supervision labels using dense point clouds refined by implicit representations and set covering optimization problems. Simulated experiments show that our method achieves sufficient reconstruction quality, outperforming several baselines under limited view and movement budgets. We further demonstrate the applicability of our approach in a real-world object reconstruction scenario.
