Table of Contents
Fetching ...

Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics

Jad Abou-Chakra, Krishan Rana, Feras Dayoub, Niko Sünderhauf

TL;DR

A novel dual Gaussian-Particle representation that models the physical world while enabling predictive simulation of future states and allowing online correction from visual observations in a dynamic world is proposed.

Abstract

For robots to robustly understand and interact with the physical world, it is highly beneficial to have a comprehensive representation - modelling geometry, physics, and visual observations - that informs perception, planning, and control algorithms. We propose a novel dual Gaussian-Particle representation that models the physical world while (i) enabling predictive simulation of future states and (ii) allowing online correction from visual observations in a dynamic world. Our representation comprises particles that capture the geometrical aspect of objects in the world and can be used alongside a particle-based physics system to anticipate physically plausible future states. Attached to these particles are 3D Gaussians that render images from any viewpoint through a splatting process thus capturing the visual state. By comparing the predicted and observed images, our approach generates visual forces that correct the particle positions while respecting known physical constraints. By integrating predictive physical modelling with continuous visually-derived corrections, our unified representation reasons about the present and future while synchronizing with reality. Our system runs in realtime at 30Hz using only 3 cameras. We validate our approach on 2D and 3D tracking tasks as well as photometric reconstruction quality. Videos are found at https://embodied-gaussians.github.io/.

Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics

TL;DR

A novel dual Gaussian-Particle representation that models the physical world while enabling predictive simulation of future states and allowing online correction from visual observations in a dynamic world is proposed.

Abstract

For robots to robustly understand and interact with the physical world, it is highly beneficial to have a comprehensive representation - modelling geometry, physics, and visual observations - that informs perception, planning, and control algorithms. We propose a novel dual Gaussian-Particle representation that models the physical world while (i) enabling predictive simulation of future states and (ii) allowing online correction from visual observations in a dynamic world. Our representation comprises particles that capture the geometrical aspect of objects in the world and can be used alongside a particle-based physics system to anticipate physically plausible future states. Attached to these particles are 3D Gaussians that render images from any viewpoint through a splatting process thus capturing the visual state. By comparing the predicted and observed images, our approach generates visual forces that correct the particle positions while respecting known physical constraints. By integrating predictive physical modelling with continuous visually-derived corrections, our unified representation reasons about the present and future while synchronizing with reality. Our system runs in realtime at 30Hz using only 3 cameras. We validate our approach on 2D and 3D tracking tasks as well as photometric reconstruction quality. Videos are found at https://embodied-gaussians.github.io/.
Paper Structure (13 sections, 7 figures, 3 algorithms)

This paper contains 13 sections, 7 figures, 3 algorithms.

Figures (7)

  • Figure 1: The tabletop setup used in the real experiments showing the robot, some of the objects used in the scenarios, and the position of the 5 cameras used.
  • Figure 2: The various functions called during the prediction and the correction step profiled. In the 'Other' phase, the GUI is drawn and new sensor observations are read. The physics step takes $5~\text{ms}$ and is followed by approximately $22~\text{ms}$ of Adam optimizations that are used to compute the visual forces.
  • Figure 3: An ablation showing the effect of different physical priors on the 3D tracking error of 12 points located on two objects on a tabletop. The scene used for this ablation is "Multiple1" from the simulated dataset. Using all physical priors produces on average the lowest tracking error.
  • Figure 4: The effect of varying the parameters of our system on 3D tracking.
  • Figure 5: The first image shows a highly dynamic scenario where the physics failed to push the TBlock into a location where visual forces could correct it. The second image shows a scenario where both visual and geometrical symmetries allowed the rope to rotate around its central axis and created a steady state error in tracking.
  • ...and 2 more figures