STAIR: Semantic-Targeted Active Implicit Reconstruction
Liren Jin, Haofei Kuang, Yue Pan, Cyrill Stachniss, Marija Popović
TL;DR
This work tackles object-level understanding for autonomous robots operating in unknown environments by targeting semantically meaningful objects during active 3D reconstruction. It introduces STAIR, a semantic implicit neural framework that learns occupancy, color, and semantic fields via hybrid voxel grids and MLPs, trained online with RGB-D data and 2D labels using differentiable volume rendering. A semantic-aware next-best-view planner combines exploitation of semantic uncertainty with exploration of unknown regions through a utility $U(v) = U_{et}(v) + \varepsilon U_{er}(v)$ to guide measurements toward objects of interest. Experiments across four scenes show STAIR achieving higher PSNR and F1-scores and producing better meshes than semantics-agnostic baselines and an explicit-map baseline, highlighting the advantages of implicit semantic representations for targeted active reconstruction.
Abstract
Many autonomous robotic applications require object-level understanding when deployed. Actively reconstructing objects of interest, i.e. objects with specific semantic meanings, is therefore relevant for a robot to perform downstream tasks in an initially unknown environment. In this work, we propose a novel framework for semantic-targeted active reconstruction using posed RGB-D measurements and 2D semantic labels as input. The key components of our framework are a semantic implicit neural representation and a compatible planning utility function based on semantic rendering and uncertainty estimation, enabling adaptive view planning to target objects of interest. Our planning approach achieves better reconstruction performance in terms of mesh and novel view rendering quality compared to implicit reconstruction baselines that do not consider semantics for view planning. Our framework further outperforms a state-of-the-art semantic-targeted active reconstruction pipeline based on explicit maps, justifying our choice of utilising implicit neural representations to tackle semantic-targeted active reconstruction problems.
