Table of Contents
Fetching ...

OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions

Maxim Popov, Regina Kurkova, Mikhail Iumanov, Jaafar Mahmoud, Sergey Kolyubin

TL;DR

OSMa-Bench tackles robustness evaluation of open semantic mapping under indoor lighting variability. It introduces a dynamic, automated benchmarking pipeline powered by LLM/LVLMs and Habitat-Sim simulations, and extends ReplicaCAD and HM3D with per-instance semantics and lighting variations. The study benchmarks OpenScene, ConceptGraphs, and BBQ, using semantic segmentation metrics and a novel scene-graph VQA evaluation to quantify performance and reasoning under different lighting. Findings reveal distinct strengths among methods and reveal systematic failure modes under low light and dynamic illumination, guiding future development of resilient OSM systems. The framework is designed for scalability and extension to additional capabilities and environments.

Abstract

Open Semantic Mapping (OSM) is a key technology in robotic perception, combining semantic segmentation and SLAM techniques. This paper introduces a dynamically configurable and highly automated LLM/LVLM-powered pipeline for evaluating OSM solutions called OSMa-Bench (Open Semantic Mapping Benchmark). The study focuses on evaluating state-of-the-art semantic mapping algorithms under varying indoor lighting conditions, a critical challenge in indoor environments. We introduce a novel dataset with simulated RGB-D sequences and ground truth 3D reconstructions, facilitating the rigorous analysis of mapping performance across different lighting conditions. Through experiments on leading models such as ConceptGraphs, BBQ, and OpenScene, we evaluate the semantic fidelity of object recognition and segmentation. Additionally, we introduce a Scene Graph evaluation method to analyze the ability of models to interpret semantic structure. The results provide insights into the robustness of these models, forming future research directions for developing resilient and adaptable robotic systems. Project page is available at https://be2rlab.github.io/OSMa-Bench/.

OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions

TL;DR

OSMa-Bench tackles robustness evaluation of open semantic mapping under indoor lighting variability. It introduces a dynamic, automated benchmarking pipeline powered by LLM/LVLMs and Habitat-Sim simulations, and extends ReplicaCAD and HM3D with per-instance semantics and lighting variations. The study benchmarks OpenScene, ConceptGraphs, and BBQ, using semantic segmentation metrics and a novel scene-graph VQA evaluation to quantify performance and reasoning under different lighting. Findings reveal distinct strengths among methods and reveal systematic failure modes under low light and dynamic illumination, guiding future development of resilient OSM systems. The framework is designed for scalability and extension to additional capabilities and environments.

Abstract

Open Semantic Mapping (OSM) is a key technology in robotic perception, combining semantic segmentation and SLAM techniques. This paper introduces a dynamically configurable and highly automated LLM/LVLM-powered pipeline for evaluating OSM solutions called OSMa-Bench (Open Semantic Mapping Benchmark). The study focuses on evaluating state-of-the-art semantic mapping algorithms under varying indoor lighting conditions, a critical challenge in indoor environments. We introduce a novel dataset with simulated RGB-D sequences and ground truth 3D reconstructions, facilitating the rigorous analysis of mapping performance across different lighting conditions. Through experiments on leading models such as ConceptGraphs, BBQ, and OpenScene, we evaluate the semantic fidelity of object recognition and segmentation. Additionally, we introduce a Scene Graph evaluation method to analyze the ability of models to interpret semantic structure. The results provide insights into the robustness of these models, forming future research directions for developing resilient and adaptable robotic systems. Project page is available at https://be2rlab.github.io/OSMa-Bench/.

Paper Structure

This paper contains 24 sections, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Our work evaluates open semantic mapping quality, providing an automated LLM/LVLM-based alternative to human assessment. We generate test sequences with different lighting conditions in a simulated indoor environment. Extra modifier, such as variations in robot nominal velocity, is applied as well.
  • Figure 2: Evaluation Pipeline Diagram. We use Habitat Sim to establish test configurations that include various environmental factors like lighting and agent trajectories, generating a diverse dataset. This allows for tailored scenes, with randomly initialized agent trajectories simulating realistic interactions for effective model evaluation.
  • Figure 3: Replica CAD Dataset with Augmented Semantics. We expanded the semantic description of the replica CAD dataset. This made it possible to take into account during testing both classes describing parts of the apartment (for example, "wall", "floor", "stairs") and classes describing furniture and household utensils.
  • Figure 4: VQA Pipeline Diagram. We utilize LLM and LVLM to get scene frames descriptions and construct a set of questions and corresponding ground truth answers, which are additionally validated and balanced in order to avoid usage of ambiguous questions for further evaluation of the scene graph.
  • Figure 5: Comparison of changes (%) in mAcc and f-mIoU across conditions relative to the baseline on the ReplicaCAD dataset.
  • ...and 12 more figures