Table of Contents
Fetching ...

STAIR: Semantic-Targeted Active Implicit Reconstruction

Liren Jin, Haofei Kuang, Yue Pan, Cyrill Stachniss, Marija Popović

TL;DR

This work tackles object-level understanding for autonomous robots operating in unknown environments by targeting semantically meaningful objects during active 3D reconstruction. It introduces STAIR, a semantic implicit neural framework that learns occupancy, color, and semantic fields via hybrid voxel grids and MLPs, trained online with RGB-D data and 2D labels using differentiable volume rendering. A semantic-aware next-best-view planner combines exploitation of semantic uncertainty with exploration of unknown regions through a utility $U(v) = U_{et}(v) + \varepsilon U_{er}(v)$ to guide measurements toward objects of interest. Experiments across four scenes show STAIR achieving higher PSNR and F1-scores and producing better meshes than semantics-agnostic baselines and an explicit-map baseline, highlighting the advantages of implicit semantic representations for targeted active reconstruction.

Abstract

Many autonomous robotic applications require object-level understanding when deployed. Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is therefore relevant for a robot to perform downstream tasks in an initially unknown environment. In this work, we propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2D semantic labels as input. The key components of our framework are a semantic implicit neural representation and a compatible planning utility function based on semantic rendering and uncertainty estimation, enabling adaptive view planning to target objects of interest. Our planning approach achieves better reconstruction performance in terms of mesh and novel view rendering quality compared to implicit reconstruction baselines that do not consider semantics for view planning. Our framework further outperforms a state-of-the-art semantic-targeted active reconstruction pipeline based on explicit maps, justifying our choice of utilising implicit neural representations to tackle semantic-targeted active reconstruction problems.

STAIR: Semantic-Targeted Active Implicit Reconstruction

TL;DR

This work tackles object-level understanding for autonomous robots operating in unknown environments by targeting semantically meaningful objects during active 3D reconstruction. It introduces STAIR, a semantic implicit neural framework that learns occupancy, color, and semantic fields via hybrid voxel grids and MLPs, trained online with RGB-D data and 2D labels using differentiable volume rendering. A semantic-aware next-best-view planner combines exploitation of semantic uncertainty with exploration of unknown regions through a utility to guide measurements toward objects of interest. Experiments across four scenes show STAIR achieving higher PSNR and F1-scores and producing better meshes than semantics-agnostic baselines and an explicit-map baseline, highlighting the advantages of implicit semantic representations for targeted active reconstruction.

Abstract

Many autonomous robotic applications require object-level understanding when deployed. Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is therefore relevant for a robot to perform downstream tasks in an initially unknown environment. In this work, we propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2D semantic labels as input. The key components of our framework are a semantic implicit neural representation and a compatible planning utility function based on semantic rendering and uncertainty estimation, enabling adaptive view planning to target objects of interest. Our planning approach achieves better reconstruction performance in terms of mesh and novel view rendering quality compared to implicit reconstruction baselines that do not consider semantics for view planning. Our framework further outperforms a state-of-the-art semantic-targeted active reconstruction pipeline based on explicit maps, justifying our choice of utilising implicit neural representations to tackle semantic-targeted active reconstruction problems.
Paper Structure (15 sections, 10 equations, 8 figures)

This paper contains 15 sections, 10 equations, 8 figures.

Figures (8)

  • Figure 1: Our novel active implicit reconstruction approach targets an object of interest (car) in an unknown environment. We incorporate semantics and uncertainty estimation into our pipeline, enabling view planning to acquire information about the object in a targeted way. The red bounding box identifies the target object. The green line shows the planned path, with pyramids indicating view frustums. By integrating semantics into our implicit neural representation, we extract mesh and render novel views only for the object of interest as exemplified in the bottom row.
  • Figure 2: Overview of our proposed framework, STAIR. We incrementally train our semantic implicit neural representation using posed RGB-D measurements and their 2D semantic labels. After training, we render semantics and uncertainty at sampled candidate views. For planning, our utility function considers both overall view uncertainty and the uncertainty from objects of interest. We select the candidate view with the highest utility value as our next measurement location. We iterate between map representation training and view planning until a maximum allowable number of measurements is reached.
  • Figure 3: Four different scenes used in our main planning experiments. Our interesting semantic classes are: car for Scene 1, camera for Scene 2, sofa for Scene 3, car and airplane for Scene 4.
  • Figure 4: Comparison of reconstruction quality of objects of interest using different planning strategies in the four test scenes shown in \ref{['F: simulation_scenes']}. We report the average PNSR and F1-score at each planning step. Solid lines show means over $5$ trials and shaded regions indicate standard deviations. Our semantic-targeted approach exploits semantics in our implicit neural representation to achieve targeted view planning, leading to better and more stable reconstruction performance.
  • Figure 5: Qualitative results using our framework showing how novel view rendering (top) and meshes (bottom) improve along planning steps during a mission. Our approach collects information about objects of interest in a targeted way to achieve high-quality reconstruction.
  • ...and 3 more figures