Table of Contents
Fetching ...

Understanding while Exploring: Semantics-driven Active Mapping

Liyan Chen, Huangying Zhan, Hairong Yin, Yi Xu, Philippos Mordohai

TL;DR

ActiveSGM addresses the challenge of proactive scene understanding by unifying geometry, semantics, and exploration in a dense active semantic mapping framework. It introduces a Semantic Gaussian Map built on 3D Gaussian Splatting, a sparse top-k semantic representation, and a differentiable rendering-based optimization that integrates OneFormer predictions, enabling real-time semantic reconstruction. The method jointly optimizes an uncertainty-guided next-best-view policy, using geometric and semantic criteria with a motion cost, and employs a global-local keyframe strategy to maintain coverage. Evaluations on Replica and MP3D show improved semantic coverage, robust performance with noisy predictions, and competitive 3D reconstruction and novel view synthesis, highlighting the practical impact of semantics-aware active mapping for autonomous robotics.

Abstract

Effective robotic autonomy in unknown environments demands proactive exploration and precise understanding of both geometry and semantics. In this paper, we propose ActiveSGM, an active semantic mapping framework designed to predict the informativeness of potential observations before execution. Built upon a 3D Gaussian Splatting (3DGS) mapping backbone, our approach employs semantic and geometric uncertainty quantification, coupled with a sparse semantic representation, to guide exploration. By enabling robots to strategically select the most beneficial viewpoints, ActiveSGM efficiently enhances mapping completeness, accuracy, and robustness to noisy semantic data, ultimately supporting more adaptive scene exploration. Our experiments on the Replica and Matterport3D datasets highlight the effectiveness of ActiveSGM in active semantic mapping tasks.

Understanding while Exploring: Semantics-driven Active Mapping

TL;DR

ActiveSGM addresses the challenge of proactive scene understanding by unifying geometry, semantics, and exploration in a dense active semantic mapping framework. It introduces a Semantic Gaussian Map built on 3D Gaussian Splatting, a sparse top-k semantic representation, and a differentiable rendering-based optimization that integrates OneFormer predictions, enabling real-time semantic reconstruction. The method jointly optimizes an uncertainty-guided next-best-view policy, using geometric and semantic criteria with a motion cost, and employs a global-local keyframe strategy to maintain coverage. Evaluations on Replica and MP3D show improved semantic coverage, robust performance with noisy predictions, and competitive 3D reconstruction and novel view synthesis, highlighting the practical impact of semantics-aware active mapping for autonomous robotics.

Abstract

Effective robotic autonomy in unknown environments demands proactive exploration and precise understanding of both geometry and semantics. In this paper, we propose ActiveSGM, an active semantic mapping framework designed to predict the informativeness of potential observations before execution. Built upon a 3D Gaussian Splatting (3DGS) mapping backbone, our approach employs semantic and geometric uncertainty quantification, coupled with a sparse semantic representation, to guide exploration. By enabling robots to strategically select the most beneficial viewpoints, ActiveSGM efficiently enhances mapping completeness, accuracy, and robustness to noisy semantic data, ultimately supporting more adaptive scene exploration. Our experiments on the Replica and Matterport3D datasets highlight the effectiveness of ActiveSGM in active semantic mapping tasks.

Paper Structure

This paper contains 34 sections, 8 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Overview of the ActiveSGM System. Our framework integrates observation, mapping, and planning into a unified active semantic mapping system. At each time step, posed RGB-D frames along with semantic predictions from OneFormer jain2023oneformer are stored in a keyframe database. Selected frames are used to update a Semantic Gaussian Map that encodes geometric, photometric, and semantic properties and is optimized through differentiable rendering. An occupancy-based Exploration Map is updated using the current view and used to sample candidate viewpoints in free space. Next-best views are selected by jointly evaluating geometric and semantic exploration criteria (E.C.), and a path planner navigates toward the selected pose. This closed-loop system enables efficient, semantics-aware reconstruction and exploration in complex 3D environments.
  • Figure 2: Qualitative Results for Replica. Our method generates denser and more accurate semantic maps than SGS-SLAM, with fewer exploration steps. Yellow boxes highlight improved boundaries and semantic consistency. Black regions denote unknown labels.
  • Figure 3: Color-Coding Ambiguities. SGS-SLAM blend colors leading to label confusion, especially under global conversion, and the introduction of irrelevant categories.
  • Figure 4: Qualitative Results for MP3D. Top-down visualizations of reconstructed scene, semantic labels and semantic entropy heatmap (low, high). Notably, our results show no high-entropy regions, and produce coherent and dense semantic reconstructions even in large scale MP3D scenes.
  • Figure S.1: Visualization of Rendering Semantic Map with Sparse Semantic Vector. Each Gaussian only stores indexes and probabilities of the top-$k$ most probable categories, the semantic distribution of the given pixel is rendered following Eqn. (3) in the main paper.
  • ...and 3 more figures