Table of Contents
Fetching ...

DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation

Luzhou Ge, Xiangyu Zhu, Zhuo Yang, Xuesong Li

TL;DR

DynamicGSG addresses the challenge of changing real-world environments by constructing dynamic, high-fidelity 3D Gaussian scene graphs that couple open-vocabulary semantic cues with object-centric Gaussian maps. The approach uses fast differentiable Gaussian splatting, 3D-2D object association, and joint supervision to enable rapid local updates and LVLM-driven hierarchical scene graphs. It demonstrates superior semantic segmentation, language-guided object retrieval, and reconstruction quality, with successful real-world dynamic updates. This work advances robust, long-term environmental understanding and adaptation for indoor robotic tasks.

Abstract

In real-world scenarios, environment changes caused by human or agent activities make it extremely challenging for robots to perform various long-term tasks. Recent works typically struggle to effectively understand and adapt to dynamic environments due to the inability to update their environment representations in memory according to environment changes and lack of fine-grained reconstruction of the environments. To address these challenges, we propose DynamicGSG, a dynamic, high-fidelity, open-vocabulary scene graph construction system leveraging Gaussian splatting. DynamicGSG builds hierarchical scene graphs using advanced vision language models to represent the spatial and semantic relationships between objects in the environments, utilizes a joint feature loss we designed to supervise Gaussian instance grouping while optimizing the Gaussian maps, and locally updates the Gaussian scene graphs according to real environment changes for long-term environment adaptation. Experiments and ablation studies demonstrate the performance and efficacy of our proposed method in terms of semantic segmentation, language-guided object retrieval, and reconstruction quality. Furthermore, we validate the dynamic updating capabilities of our system in real laboratory environments. The source code and supplementary experimental materials will be released at:~\href{https://github.com/GeLuzhou/Dynamic-GSG}{https://github.com/GeLuzhou/Dynamic-GSG}.

DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation

TL;DR

DynamicGSG addresses the challenge of changing real-world environments by constructing dynamic, high-fidelity 3D Gaussian scene graphs that couple open-vocabulary semantic cues with object-centric Gaussian maps. The approach uses fast differentiable Gaussian splatting, 3D-2D object association, and joint supervision to enable rapid local updates and LVLM-driven hierarchical scene graphs. It demonstrates superior semantic segmentation, language-guided object retrieval, and reconstruction quality, with successful real-world dynamic updates. This work advances robust, long-term environmental understanding and adaptation for indoor robotic tasks.

Abstract

In real-world scenarios, environment changes caused by human or agent activities make it extremely challenging for robots to perform various long-term tasks. Recent works typically struggle to effectively understand and adapt to dynamic environments due to the inability to update their environment representations in memory according to environment changes and lack of fine-grained reconstruction of the environments. To address these challenges, we propose DynamicGSG, a dynamic, high-fidelity, open-vocabulary scene graph construction system leveraging Gaussian splatting. DynamicGSG builds hierarchical scene graphs using advanced vision language models to represent the spatial and semantic relationships between objects in the environments, utilizes a joint feature loss we designed to supervise Gaussian instance grouping while optimizing the Gaussian maps, and locally updates the Gaussian scene graphs according to real environment changes for long-term environment adaptation. Experiments and ablation studies demonstrate the performance and efficacy of our proposed method in terms of semantic segmentation, language-guided object retrieval, and reconstruction quality. Furthermore, we validate the dynamic updating capabilities of our system in real laboratory environments. The source code and supplementary experimental materials will be released at:~\href{https://github.com/GeLuzhou/Dynamic-GSG}{https://github.com/GeLuzhou/Dynamic-GSG}.

Paper Structure

This paper contains 29 sections, 16 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The dynamic high-fidelity multi-layer Gaussian scene graphs we constructed can adapt to environment changes, represent the spatial and semantic relationships of the objects, and support various forms of language-guided object retrieval.
  • Figure 2: Overview of DynamicGSG: Our system processes posed RGB-D sequences, utilizes open-vocabulary object detection and segmentation models to obtain 2D masks, and extracts corresponding semantic features. In parallel, we employ instance-level rendering to get 2D masks and semantic features of objects in the map for object fusion. Subsequently, we perform Gaussian initialization and joint optimization to incrementally create a high-fidelity object-centric Gaussian map. Based on the spatial relationship of objects and the capabilities of LVLM, we construct a hierarchical scene graph to provide a structured description of the scene. In dynamic real-world scenarios, after refining the initial camera poses obtained from VINS-Fusionqin2019a, we detect local changes and make corresponding modifications in the Gaussian map and scene graph for environment adaptation.
  • Figure 3: 3D-2D Gaussian Object Association.
  • Figure 4: Visualization of Feature Loss Ablation Experiments.
  • Figure 5: Qualitative Results of Object Retrieval: DynamicGSG effectively locates objects that ConceptGraphs gu2023conceptgraphsopenvocabulary3dscene cannot retrieve through Hierarchical scene graph-based match.
  • ...and 1 more figures