DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping

Felix Igelbrink; Lennart Niecksch; Martin Atzmueller; Joachim Hertzberg

DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping

Felix Igelbrink, Lennart Niecksch, Martin Atzmueller, Joachim Hertzberg

TL;DR

Dense Integrated Semantic Context is introduced, featuring a novel single-pass, distance-weighted extraction mechanism that significantly surpasses current state-of-the-art zero-shot methods in both semantic accuracy and query retrieval, providing a robust, real-time capable framework for robotic deployment.

Abstract

Open-set semantic mapping enables language-driven robotic perception, but current instance-centric approaches are bottlenecked by context-depriving and computationally expensive crop-based feature extraction. To overcome this fundamental limitation, we introduce DISC (Dense Integrated Semantic Context), featuring a novel single-pass, distance-weighted extraction mechanism. By deriving high-fidelity CLIP embeddings directly from the vision transformer's intermediate layers, our approach eliminates the latency and domain-shift artifacts of traditional image cropping, yielding pure, mask-aligned semantic representations. To fully leverage these features in large-scale continuous mapping, DISC is built upon a fully GPU-accelerated architecture that replaces periodic offline processing with precise, on-the-fly voxel-level instance refinement. We evaluate our approach on standard benchmarks (Replica, ScanNet) and a newly generated large-scale-mapping dataset based on Habitat-Matterport 3D (HM3DSEM) to assess scalability across complex scenes in multi-story buildings. Extensive evaluations demonstrate that DISC significantly surpasses current state-of-the-art zero-shot methods in both semantic accuracy and query retrieval, providing a robust, real-time capable framework for robotic deployment. The full source code, data generation and evaluation pipelines will be made available at https://github.com/DFKI-NI/DISC.

DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 5 figures, 5 tables)

This paper contains 20 sections, 3 equations, 5 figures, 5 tables.

INTRODUCTION
RELATED WORK
Open-Set Semantic Mapping
Scalability and Instance Refinement
Vision-Language Feature Integration
METHOD
Segmentation and Feature Extraction
Data Association and Scene Integration
Vision-Language Feature Integration
EVALUATION
Experimental Setup and Metrics
3D Open-Set Semantic Segmentation
Object Level Semantics on HM3DSEM
Large-Scale Dataset Generation
Dataset: Configuration and Evaluation
...and 5 more sections

Figures (5)

Figure 1: Example mapping results on a hm3d scene. Top: The tracked instances of the map, randomly colored. Bottom: The resulting semantic segmentation with the $\text{top}_k$ semantic classes.
Figure 2: Visualization of our single-pass dense feature extraction. (a) The original RGB input frame. (b) Weighted dense cosine similarity heatmap weighted for the open-vocabulary query "an image of a chair". The extracted patch features provide precise semantic grounding without requiring image crops.
Figure 3: Generated trajectory from hm3d scene 00800. Red: navigation mesh. Blue lines: generated trajectory. Teal points: coverage analysis from ray tracing as down sampled voxel grid.
Figure 4: (a) Ratio of covered objects ($Area>50\%$ of surface covered) and ratio of covered surface for our generated trajectories for hm3d. High variance is explained by the fact that the largest connected component of the navigation mesh from Habitat only covers the scene partially (Largest Island/Total Area). (b) Tour length plotted in relation to the navigable area, with color encoding the resulting dataset size (= simulation steps required).
Figure 5: Performance of the mapping pipeline on the exemplary hm3d scene 00849.

DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping

TL;DR

Abstract

DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping

Authors

TL;DR

Abstract

Table of Contents

Figures (5)