Table of Contents
Fetching ...

Augmented Environment Representations with Complete Object Models

Krishnananda Prabhu Sivananda, Francesco Verdoja, Ville Kyrki

TL;DR

This work tackles the gap between perception and planning in mobile robotics by building a multi-layer environment representation that includes 2D occupancy, 3D metric-semantic geometry, and an object instance layer with complete object extents. The approach combines online reconstruction via Voxblox and Kimera-Semantics with a deterministic object-shape completion pipeline that matches partial object observations to a ShapeNet CAD-model database using ICP-based pose alignment, replacing partials with matched full models. Experimental results in simulation and on real robots show that model matching can outperform state-of-the-art deep-learning shape completion for unseen object parts, and that the resulting augmented scenes improve navigation safety compared to 2D SLAM alone. The method provides a practical path for integrating rich perceptual knowledge into robotic planning, though it requires scalable object databases and careful handling of segmentation and real-world noise to generalize across environments and object classes.

Abstract

While 2D occupancy maps commonly used in mobile robotics enable safe navigation in indoor environments, in order for robots to understand and interact with their environment and its inhabitants representing 3D geometry and semantic environment information is required. Semantic information is crucial in effective interpretation of the meanings humans attribute to different parts of a space, while 3D geometry is important for safety and high-level understanding. We propose a pipeline that can generate a multi-layer representation of indoor environments for robotic applications. The proposed representation includes 3D metric-semantic layers, a 2D occupancy layer, and an object instance layer where known objects are replaced with an approximate model obtained through a novel model-matching approach. The metric-semantic layer and the object instance layer are combined to form an augmented representation of the environment. Experiments show that the proposed shape matching method outperforms a state-of-the-art deep learning method when tasked to complete unseen parts of objects in the scene. The pipeline performance translates well from simulation to real world as shown by F1-score analysis, with semantic segmentation accuracy using Mask R-CNN acting as the major bottleneck. Finally, we also demonstrate on a real robotic platform how the multi-layer map can be used to improve navigation safety.

Augmented Environment Representations with Complete Object Models

TL;DR

This work tackles the gap between perception and planning in mobile robotics by building a multi-layer environment representation that includes 2D occupancy, 3D metric-semantic geometry, and an object instance layer with complete object extents. The approach combines online reconstruction via Voxblox and Kimera-Semantics with a deterministic object-shape completion pipeline that matches partial object observations to a ShapeNet CAD-model database using ICP-based pose alignment, replacing partials with matched full models. Experimental results in simulation and on real robots show that model matching can outperform state-of-the-art deep-learning shape completion for unseen object parts, and that the resulting augmented scenes improve navigation safety compared to 2D SLAM alone. The method provides a practical path for integrating rich perceptual knowledge into robotic planning, though it requires scalable object databases and careful handling of segmentation and real-world noise to generalize across environments and object classes.

Abstract

While 2D occupancy maps commonly used in mobile robotics enable safe navigation in indoor environments, in order for robots to understand and interact with their environment and its inhabitants representing 3D geometry and semantic environment information is required. Semantic information is crucial in effective interpretation of the meanings humans attribute to different parts of a space, while 3D geometry is important for safety and high-level understanding. We propose a pipeline that can generate a multi-layer representation of indoor environments for robotic applications. The proposed representation includes 3D metric-semantic layers, a 2D occupancy layer, and an object instance layer where known objects are replaced with an approximate model obtained through a novel model-matching approach. The metric-semantic layer and the object instance layer are combined to form an augmented representation of the environment. Experiments show that the proposed shape matching method outperforms a state-of-the-art deep learning method when tasked to complete unseen parts of objects in the scene. The pipeline performance translates well from simulation to real world as shown by F1-score analysis, with semantic segmentation accuracy using Mask R-CNN acting as the major bottleneck. Finally, we also demonstrate on a real robotic platform how the multi-layer map can be used to improve navigation safety.

Paper Structure

This paper contains 17 sections, 2 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: An environment representation built using the proposed framework.
  • Figure 2: Detailed overview of the pipeline
  • Figure 3: Discrepancies caused by inaccurate object mask. In both pictures the marked regions represent erroneous spread of a label (best viewed in color).
  • Figure 4: Simulated office scene (\ref{['fig:office']}). In the 3D geometric model (\ref{['fig:false_pos']}), chair partial observations have been replaced with the complete models and each area marked in red in (\ref{['fig:false_pos']}) denotes a false positive. A detail of the meeting table in the North-West corner of the environment is shown zoomed in (\ref{['fig:mm_tab']}) (best viewed in color).
  • Figure 5: Performance of pcn on chairs with various degree of completion. Input (left) and output (right) are in the same pose, misalignment is due to a failure of the network.
  • ...and 3 more figures