Table of Contents
Fetching ...

Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots

Khang Nguyen, Tuan Dang, Manfred Huber

TL;DR

The paper tackles 3D panoptic mapping for mobile robots by refining RGB-based segmentation masks through depth-driven kernel density estimation and integrating the results with projective signed distance functions. The core idea is to perform depth-hole filling, non-parametric depth outlier rejection, and KDE-based mask refinement to produce accurate, out-of-distribution–robust masks before incremental SDF-based volumetric reconstruction with semantic labels. The main contributions are a parametric-free depth refinement pipeline and its seamless integration with semantic SDF updates, achieving improved mask IOU and cleaner 3D reconstructions on synthetic data and real-robot experiments. This approach enhances robustness and accuracy for real-world robotic perception, enabling more reliable manipulation and scene understanding in indoor environments.

Abstract

Reconstructing three-dimensional (3D) scenes with semantic understanding is vital in many robotic applications. Robots need to identify which objects, along with their positions and shapes, to manipulate them precisely with given tasks. Mobile robots, especially, usually use lightweight networks to segment objects on RGB images and then localize them via depth maps; however, they often encounter out-of-distribution scenarios where masks over-cover the objects. In this paper, we address the problem of panoptic segmentation quality in 3D scene reconstruction by refining segmentation errors using non-parametric statistical methods. To enhance mask precision, we map the predicted masks into a depth frame to estimate their distribution via kernel densities. The outliers in depth perception are then rejected without the need for additional parameters in an adaptive manner to out-of-distribution scenarios, followed by 3D reconstruction using projective signed distance functions (SDFs). We validate our method on a synthetic dataset, which shows improvements in both quantitative and qualitative results for panoptic mapping. Through real-world testing, the results furthermore show our method's capability to be deployed on a real-robot system. Our source code is available at: https://github.com/mkhangg/refined panoptic mapping.

Volumetric Mapping with Panoptic Refinement via Kernel Density Estimation for Mobile Robots

TL;DR

The paper tackles 3D panoptic mapping for mobile robots by refining RGB-based segmentation masks through depth-driven kernel density estimation and integrating the results with projective signed distance functions. The core idea is to perform depth-hole filling, non-parametric depth outlier rejection, and KDE-based mask refinement to produce accurate, out-of-distribution–robust masks before incremental SDF-based volumetric reconstruction with semantic labels. The main contributions are a parametric-free depth refinement pipeline and its seamless integration with semantic SDF updates, achieving improved mask IOU and cleaner 3D reconstructions on synthetic data and real-robot experiments. This approach enhances robustness and accuracy for real-world robotic perception, enabling more reliable manipulation and scene understanding in indoor environments.

Abstract

Reconstructing three-dimensional (3D) scenes with semantic understanding is vital in many robotic applications. Robots need to identify which objects, along with their positions and shapes, to manipulate them precisely with given tasks. Mobile robots, especially, usually use lightweight networks to segment objects on RGB images and then localize them via depth maps; however, they often encounter out-of-distribution scenarios where masks over-cover the objects. In this paper, we address the problem of panoptic segmentation quality in 3D scene reconstruction by refining segmentation errors using non-parametric statistical methods. To enhance mask precision, we map the predicted masks into a depth frame to estimate their distribution via kernel densities. The outliers in depth perception are then rejected without the need for additional parameters in an adaptive manner to out-of-distribution scenarios, followed by 3D reconstruction using projective signed distance functions (SDFs). We validate our method on a synthetic dataset, which shows improvements in both quantitative and qualitative results for panoptic mapping. Through real-world testing, the results furthermore show our method's capability to be deployed on a real-robot system. Our source code is available at: https://github.com/mkhangg/refined panoptic mapping.

Paper Structure

This paper contains 22 sections, 7 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) Indoor mobile robots operating in an environment with multiple objects (b) refines RGB-based segmentation masks using kernel density estimation via depth perception, and (c) rebuilds panoptic map with object instances using projective signed distance functions.
  • Figure 2: Depth maps of object instances containing depth outliers (top row) due to the imperfection of segmentation models and their density estimations along depth perception (middle row), and refined depth maps (bottom row). The shaded depth values on the density lines in between vertical red cutoff lines are considered inliers; otherwise, Alg. \ref{['alg:mask_refinement']} rejects them as they appear to be outliers. The outliers are encoded by the same colors as Fig. \ref{['fig:outlier_removal']} along with corresponding objects presented in the scene.
  • Figure 3: The scene of multiple objects with outliers boxed in red (left) and the scene without outliers after applying Alg. \ref{['alg:mask_refinement']} (right).
  • Figure 4: Qualitative results on the flat dataset of (a) the original panoptic mapping approach, (b) the original approach coupled with mask refinement, (c) our approach without mask refinement, and (d) our approach with mask refinement. The room texture and its panoptic segmentation ground truth are retrieved based on RGB images and annotation masks provided by the original framework schmid2022panoptic. Note that the robot frame indicating its pose is simplified and represented as the RGB mesh frame in each reconstructed map.
  • Figure 5: Comparisons of object detail reconstruction quality between (a) our approach with mask refinement and (b) from ground truth.
  • ...and 1 more figures