Volumetric Semantically Consistent 3D Panoptic Mapping
Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath
TL;DR
The paper tackles online 3D semantic-instance mapping for unstructured environments by extending a voxel-TSDF framework with (i) semantic prediction confidence integration, (ii) semantically consistent super-point construction, and (iii) graph-based semantic labeling plus instance refinement. The method fuses per-frame 2D panoptic-geometric predictions into a global map, optimizes semantic labels over a super-point graph, and refines instance assignments to reduce under- and over-segmentation, achieving state-of-the-art results on large public datasets. Notably, it demonstrates robustness under SLAM trajectories and reveals that GT pose-based evaluations significantly overstate real-world performance, underscoring the need for SLAM-based evaluation in future work. The practical impact lies in producing accurate, real-time, scalable 3D semantic maps for autonomous agents in complex scenes, with potential improvements from better 2D panoptic inputs and pose estimation pipelines.
Abstract
We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data.
