Augmented Environment Representations with Complete Object Models
Krishnananda Prabhu Sivananda, Francesco Verdoja, Ville Kyrki
TL;DR
This work tackles the gap between perception and planning in mobile robotics by building a multi-layer environment representation that includes 2D occupancy, 3D metric-semantic geometry, and an object instance layer with complete object extents. The approach combines online reconstruction via Voxblox and Kimera-Semantics with a deterministic object-shape completion pipeline that matches partial object observations to a ShapeNet CAD-model database using ICP-based pose alignment, replacing partials with matched full models. Experimental results in simulation and on real robots show that model matching can outperform state-of-the-art deep-learning shape completion for unseen object parts, and that the resulting augmented scenes improve navigation safety compared to 2D SLAM alone. The method provides a practical path for integrating rich perceptual knowledge into robotic planning, though it requires scalable object databases and careful handling of segmentation and real-world noise to generalize across environments and object classes.
Abstract
While 2D occupancy maps commonly used in mobile robotics enable safe navigation in indoor environments, in order for robots to understand and interact with their environment and its inhabitants representing 3D geometry and semantic environment information is required. Semantic information is crucial in effective interpretation of the meanings humans attribute to different parts of a space, while 3D geometry is important for safety and high-level understanding. We propose a pipeline that can generate a multi-layer representation of indoor environments for robotic applications. The proposed representation includes 3D metric-semantic layers, a 2D occupancy layer, and an object instance layer where known objects are replaced with an approximate model obtained through a novel model-matching approach. The metric-semantic layer and the object instance layer are combined to form an augmented representation of the environment. Experiments show that the proposed shape matching method outperforms a state-of-the-art deep learning method when tasked to complete unseen parts of objects in the scene. The pipeline performance translates well from simulation to real world as shown by F1-score analysis, with semantic segmentation accuracy using Mask R-CNN acting as the major bottleneck. Finally, we also demonstrate on a real robotic platform how the multi-layer map can be used to improve navigation safety.
