Table of Contents
Fetching ...

Active Implicit Reconstruction Using One-Shot View Planning

Hao Hu, Sicong Pan, Liren Jin, Marija Popović, Maren Bennewitz

TL;DR

This work tackles active object reconstruction under restricted resources by introducing OSVP, a one-shot view planning framework that leverages implicit surface representations. OSVP predicts a minimal set of viewpoints from POCO-refined dense surface points, bridging implicit priors with efficient planning and reducing movement and view counts. A new dataset generation approach labels the smallest view sets via set-covering on refined surfaces, and the OSVP architecture combines a PoinTr backbone with a ViewState Transformer trained through a set-covering objective. Experiments on synthetic and real-world data show that OSVP achieves comparable implicit reconstruction quality with fewer views and lower movement costs, illustrating practical gains for robot-assisted object modeling.

Abstract

Active object reconstruction using autonomous robots is gaining great interest. A primary goal in this task is to maximize the information of the object to be reconstructed, given limited on-board resources. Previous view planning methods exhibit inefficiency since they rely on an iterative paradigm based on explicit representations, consisting of (1) planning a path to the next-best view only; and (2) requiring a considerable number of less-gain views in terms of surface coverage. To address these limitations, we propose to integrate implicit representations into the One-Shot View Planning (OSVP). The key idea behind our approach is to use implicit representations to obtain the small missing surface areas instead of observing them with extra views. Therefore, we design a deep neural network, named OSVP, to directly predict a set of views given a dense point cloud refined from an initial sparse observation. To train our OSVP network, we generate supervision labels using dense point clouds refined by implicit representations and set covering optimization problems. Simulated experiments show that our method achieves sufficient reconstruction quality, outperforming several baselines under limited view and movement budgets. We further demonstrate the applicability of our approach in a real-world object reconstruction scenario.

Active Implicit Reconstruction Using One-Shot View Planning

TL;DR

This work tackles active object reconstruction under restricted resources by introducing OSVP, a one-shot view planning framework that leverages implicit surface representations. OSVP predicts a minimal set of viewpoints from POCO-refined dense surface points, bridging implicit priors with efficient planning and reducing movement and view counts. A new dataset generation approach labels the smallest view sets via set-covering on refined surfaces, and the OSVP architecture combines a PoinTr backbone with a ViewState Transformer trained through a set-covering objective. Experiments on synthetic and real-world data show that OSVP achieves comparable implicit reconstruction quality with fewer views and lower movement costs, illustrating practical gains for robot-assisted object modeling.

Abstract

Active object reconstruction using autonomous robots is gaining great interest. A primary goal in this task is to maximize the information of the object to be reconstructed, given limited on-board resources. Previous view planning methods exhibit inefficiency since they rely on an iterative paradigm based on explicit representations, consisting of (1) planning a path to the next-best view only; and (2) requiring a considerable number of less-gain views in terms of surface coverage. To address these limitations, we propose to integrate implicit representations into the One-Shot View Planning (OSVP). The key idea behind our approach is to use implicit representations to obtain the small missing surface areas instead of observing them with extra views. Therefore, we design a deep neural network, named OSVP, to directly predict a set of views given a dense point cloud refined from an initial sparse observation. To train our OSVP network, we generate supervision labels using dense point clouds refined by implicit representations and set covering optimization problems. Simulated experiments show that our method achieves sufficient reconstruction quality, outperforming several baselines under limited view and movement budgets. We further demonstrate the applicability of our approach in a real-world object reconstruction scenario.
Paper Structure (17 sections, 3 equations, 7 figures, 5 tables)

This paper contains 17 sections, 3 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: A comparison of our proposed implicit representation-based one-shot view planning and an explicit representation-based next-best-view (NBV) planning method pan2022aglobal under a limited view budget (six in this example). The views (red-green-blue), global path (purple), local path (cyan), and accumulated observed point clouds along with their surface coverage ($83\%$ and $90\%$) are shown on the top regions. The bottom row shows reconstructed meshes using the implicit representation Boulch_2022_CVPR. Compared to the ground truth mesh, our method achieves a comparable mesh quality, especially in the handlebar area of the cup (green circles), with less surface coverage and less movement cost than the NBV baseline.
  • Figure 2: An example of our online workflow. The robot observes a sparse point cloud at the initial view, which is passed through POCO to generate a refined point cloud. We input the refined point cloud into our OSVP network to predict a set of views for efficiently covering the object. The planned views (red-green-blue) and the initial view (red) are connected by a global path (purple) for the robot to execute.
  • Figure 3: An example solution to SCOP for object surface coverage using six views. Colors represent views and the corresponding surfaces of the object they cover.
  • Figure 4: OSVP network architecture: PoinTr yu2021pointr is used as the backbone to extract features from a point cloud and a vanilla 1-D transformer vaswani2017attention is used to process view states. $\bigoplus$ denotes element-wise addition. MLP stands for fully connected layers. The SCLoss is computed between $\hat{V}_{pred}$ and $V_{gt}$ for training.
  • Figure 5: Test objects and OSVP network training: (a) 3D mesh models of 10 test objects with complex surfaces; (b) precision, recall, and loss on the validation dataset over training epochs.
  • ...and 2 more figures