3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans

Antoni Rosinol; Arjun Gupta; Marcus Abate; Jingnan Shi; Luca Carlone

3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans

Antoni Rosinol, Arjun Gupta, Marcus Abate, Jingnan Shi, Luca Carlone

TL;DR

3D Dynamic Scene Graphs (DSGs) unify geometry, semantics, and dynamics into a layered scene representation suitable for planning and decision-making. The authors present SPIN, an automatic pipeline that builds DSGs from visual-inertial data, integrating object and dense human mesh detection with place/room parsing. They demonstrate the system in a photo-realistic Unity simulator, showing robustness in crowded scenes and accurate parsing of humans, objects, places, and rooms. The work enables actionable planning, human-robot interaction, long-term autonomy, and scene prediction by providing hierarchical, time-aware scene representations.

Abstract

We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs. Scene graphs are directed graphs where nodes represent entities in the scene (e.g. objects, walls, rooms), and edges represent relations (e.g. inclusion, adjacency) among nodes. Dynamic scene graphs (DSGs) extend this notion to represent dynamic scenes with moving agents (e.g. humans, robots), and to include actionable information that supports planning and decision-making (e.g. spatio-temporal relations, topology at different levels of abstraction). Our second contribution is to provide the first fully automatic Spatial PerceptIon eNgine(SPIN) to build a DSG from visual-inertial data. We integrate state-of-the-art techniques for object and human detection and pose estimation, and we describe how to robustly infer object, robot, and human nodes in crowded scenes. To the best of our knowledge, this is the first paper that reconciles visual-inertial SLAM and dense human mesh tracking. Moreover, we provide algorithms to obtain hierarchical representations of indoor environments (e.g. places, structures, rooms) and their relations. Our third contribution is to demonstrate the proposed spatial perception engine in a photo-realistic Unity-based simulator, where we assess its robustness and expressiveness. Finally, we discuss the implications of our proposal on modern robotics applications. 3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction. A video abstract is available at https://youtu.be/SWbofjhyPzI

3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans

TL;DR

Abstract

3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)

Theorems & Definitions (2)