Table of Contents
Fetching ...

Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments

Lukas Schmid, Marcus Abate, Yun Chang, Luca Carlone

TL;DR

Khronos introduces the Spatio-temporal Metric-Semantic SLAM (SMS) problem and a unified, real-time solution that jointly handles short-term dynamics and long-term scene changes. The system decomposes the problem into local fragment estimation within an active window, a global optimization over robot and fragment poses via a deformation graph, and a reconciliation step that builds a dense 4D spatio-temporal map. The approach unifies existing interpretations of dynamic and changing environments, demonstrates strong performance on simulated and real-world benchmarks, and operates in real time across heterogeneous robots. This work advances long-term autonomy by enabling persistent semantic mapping and change detection in dynamic human-centric settings, with open-source code and datasets to support broader adoption.

Abstract

Perceiving and understanding highly dynamic and changing environments is a crucial capability for robot autonomy. While large strides have been made towards developing dynamic SLAM approaches that estimate the robot pose accurately, a lesser emphasis has been put on the construction of dense spatio-temporal representations of the robot environment. A detailed understanding of the scene and its evolution through time is crucial for long-term robot autonomy and essential to tasks that require long-term reasoning, such as operating effectively in environments shared with humans and other agents and thus are subject to short and long-term dynamics. To address this challenge, this work defines the Spatio-temporal Metric-semantic SLAM (SMS) problem, and presents a framework to factorize and solve it efficiently. We show that the proposed factorization suggests a natural organization of a spatio-temporal perception system, where a fast process tracks short-term dynamics in an active temporal window, while a slower process reasons over long-term changes in the environment using a factor graph formulation. We provide an efficient implementation of the proposed spatio-temporal perception approach, that we call Khronos, and show that it unifies exiting interpretations of short-term and long-term dynamics and is able to construct a dense spatio-temporal map in real-time. We provide simulated and real results, showing that the spatio-temporal maps built by Khronos are an accurate reflection of a 3D scene over time and that Khronos outperforms baselines across multiple metrics. We further validate our approach on two heterogeneous robots in challenging, large-scale real-world environments.

Khronos: A Unified Approach for Spatio-Temporal Metric-Semantic SLAM in Dynamic Environments

TL;DR

Khronos introduces the Spatio-temporal Metric-Semantic SLAM (SMS) problem and a unified, real-time solution that jointly handles short-term dynamics and long-term scene changes. The system decomposes the problem into local fragment estimation within an active window, a global optimization over robot and fragment poses via a deformation graph, and a reconciliation step that builds a dense 4D spatio-temporal map. The approach unifies existing interpretations of dynamic and changing environments, demonstrates strong performance on simulated and real-world benchmarks, and operates in real time across heterogeneous robots. This work advances long-term autonomy by enabling persistent semantic mapping and change detection in dynamic human-centric settings, with open-source code and datasets to support broader adoption.

Abstract

Perceiving and understanding highly dynamic and changing environments is a crucial capability for robot autonomy. While large strides have been made towards developing dynamic SLAM approaches that estimate the robot pose accurately, a lesser emphasis has been put on the construction of dense spatio-temporal representations of the robot environment. A detailed understanding of the scene and its evolution through time is crucial for long-term robot autonomy and essential to tasks that require long-term reasoning, such as operating effectively in environments shared with humans and other agents and thus are subject to short and long-term dynamics. To address this challenge, this work defines the Spatio-temporal Metric-semantic SLAM (SMS) problem, and presents a framework to factorize and solve it efficiently. We show that the proposed factorization suggests a natural organization of a spatio-temporal perception system, where a fast process tracks short-term dynamics in an active temporal window, while a slower process reasons over long-term changes in the environment using a factor graph formulation. We provide an efficient implementation of the proposed spatio-temporal perception approach, that we call Khronos, and show that it unifies exiting interpretations of short-term and long-term dynamics and is able to construct a dense spatio-temporal map in real-time. We provide simulated and real results, showing that the spatio-temporal maps built by Khronos are an accurate reflection of a 3D scene over time and that Khronos outperforms baselines across multiple metrics. We further validate our approach on two heterogeneous robots in challenging, large-scale real-world environments.
Paper Structure (17 sections, 17 equations, 9 figures, 3 tables)

This paper contains 17 sections, 17 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: We propose Khronos, a unified approach to reason about short-term dynamics and long-term changes when performing metric-semantic simultaneous mapping and localization (SLAM) in dynamic environments. A few instances from Khronos' spatio-temporal map, representing the scene state at all times, are shown above. Short-term dynamics (left) are shown in magenta and compared against observed human actions over the corresponding time interval. We show the current and initial bounding box around the detected moving points as well as the centroid trajectory. Both humans and inanimate objects such as a cart (bottom left) are detected. Long-term changes (right) are shown for three time instances of the same scene. The earliest instance is at time 0:20 (top right). While the robot is moving through the hallways, a chair is removed and a red cooler is placed on top of the table; these changes are detected as the robot revisits and closes the loop at time 1:52 (bottom right). Lastly, the cooler is removed again, which is detected by the robot at time 3:35.
  • Figure 2: Key variables in our formulation. All observations $Z_j^t$ of $O_i$ are grouped into fragments $Y_k$, s.t. local consistency holds. Once it breaks, i.e., as $t(n) > t(2) + \delta$, measurements are grouped into a new fragment $Y_{k+1}$. This allows estimating $Y_k$ from the closed sets $\bar{Z}_k=\{Z_j^0,Z_j^1,Z_j^2\}$ and $\bar{\Phi}_k=\{\Phi^0,\Phi^1\}$ (left). Once the robot and fragment poses as well as associations are optimized, the object $O_i$ is fully specified by all of its fragments $\bar{Y}_i =\{ Y_k, Y_{k+1}\}$ (right).
  • Figure 3: Khronos takes in robot odometry $\Phi$, and semantic and RGBD inputs $Z$. The active window (Sec. \ref{['sec:method_local']}) uses these to estimate a local representation and object fragments $Y$. We then perform global optimization (Sec. \ref{['sec:method_global']}) to estimate robot poses $X$ and fragment association $A$. Finally, the optimized fragment states are reconciled (Sec. \ref{['sec:method_reconciliation']}) to estimate the spatio-temporal map of the scene.
  • Figure 4: Overview of the constructed deformation graph ${\cal E}$.
  • Figure 5: Deformable change detection. A $\mathbf{p}_q$ in front of $\mathbf{p}_v$ indicates the ray would pass through $\mathbf{p}_q$, and $\mathbf{p}_q$ is thus absent. Similarly, $\mathbf{p}_q$ near or behind $\mathbf{p}_v$ indicates presence or an occlusion, respectively. All $\mathbf{p}_r$ and $\mathbf{p}_v$ can freely move, rendering the approach deformable.
  • ...and 4 more figures