Table of Contents
Fetching ...

Learning rigid-body simulators over implicit shapes for large-scale scenes and vision

Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R. Allen, William F. Whitney, Kimberly Stachenfeld, Tobias Pfaff

TL;DR

SDF-Sim presents a scalable, learned rigid-body simulator that uses per-object SDFs to represent shapes and a novel SDF-based inter-object edge construction to reduce collision-edge complexity from quadratic to linear in the number of surface nodes. The method enables training-scale graph neural networks to simulate hundreds to over a million nodes, including real-world scenes derived from vision, by querying SDFs for efficient collision handling and closest-point computations. Through extensive experiments on Kubric Movi-B/C, large-scale sphere-in-bowl tests, and vision-derived scenes, the work demonstrates substantial memory and runtime benefits with competitive accuracy, and shows the potential to handle real-world scenes beyond mesh-based limits. Limitations include the need to train an SDF per shape, with future work aiming to amortize SDF training and extend to deformable or articulated objects, broadening applicability to animation, robotics, and VR.

Abstract

Simulating large scenes with many rigid objects is crucial for a variety of applications, such as robotics, engineering, film and video games. Rigid interactions are notoriously hard to model: small changes to the initial state or the simulation parameters can lead to large changes in the final state. Recently, learned simulators based on graph networks (GNNs) were developed as an alternative to hand-designed simulators like MuJoCo and PyBullet. They are able to accurately capture dynamics of real objects directly from real-world observations. However, current state-of-the-art learned simulators operate on meshes and scale poorly to scenes with many objects or detailed shapes. Here we present SDF-Sim, the first learned rigid-body simulator designed for scale. We use learned signed-distance functions (SDFs) to represent the object shapes and to speed up distance computation. We design the simulator to leverage SDFs and avoid the fundamental bottleneck of the previous simulators associated with collision detection. For the first time in literature, we demonstrate that we can scale the GNN-based simulators to scenes with hundreds of objects and up to 1.1 million nodes, where mesh-based approaches run out of memory. Finally, we show that SDF-Sim can be applied to real world scenes by extracting SDFs from multi-view images.

Learning rigid-body simulators over implicit shapes for large-scale scenes and vision

TL;DR

SDF-Sim presents a scalable, learned rigid-body simulator that uses per-object SDFs to represent shapes and a novel SDF-based inter-object edge construction to reduce collision-edge complexity from quadratic to linear in the number of surface nodes. The method enables training-scale graph neural networks to simulate hundreds to over a million nodes, including real-world scenes derived from vision, by querying SDFs for efficient collision handling and closest-point computations. Through extensive experiments on Kubric Movi-B/C, large-scale sphere-in-bowl tests, and vision-derived scenes, the work demonstrates substantial memory and runtime benefits with competitive accuracy, and shows the potential to handle real-world scenes beyond mesh-based limits. Limitations include the need to train an SDF per shape, with future work aiming to amortize SDF training and extend to deformable or articulated objects, broadening applicability to animation, robotics, and VR.

Abstract

Simulating large scenes with many rigid objects is crucial for a variety of applications, such as robotics, engineering, film and video games. Rigid interactions are notoriously hard to model: small changes to the initial state or the simulation parameters can lead to large changes in the final state. Recently, learned simulators based on graph networks (GNNs) were developed as an alternative to hand-designed simulators like MuJoCo and PyBullet. They are able to accurately capture dynamics of real objects directly from real-world observations. However, current state-of-the-art learned simulators operate on meshes and scale poorly to scenes with many objects or detailed shapes. Here we present SDF-Sim, the first learned rigid-body simulator designed for scale. We use learned signed-distance functions (SDFs) to represent the object shapes and to speed up distance computation. We design the simulator to leverage SDFs and avoid the fundamental bottleneck of the previous simulators associated with collision detection. For the first time in literature, we demonstrate that we can scale the GNN-based simulators to scenes with hundreds of objects and up to 1.1 million nodes, where mesh-based approaches run out of memory. Finally, we show that SDF-Sim can be applied to real world scenes by extracting SDFs from multi-view images.
Paper Structure (56 sections, 2 equations, 19 figures, 4 tables)

This paper contains 56 sections, 2 equations, 19 figures, 4 tables.

Figures (19)

  • Figure 1: Overview of SDF-Sim pipeline. SDFs parameterized by MLPs are learned for each object to implicitly represent the object shape and the distance field. A GNN-based simulator uses learned SDFs to predict object dynamics for the next simulation step.
  • Figure 2: Example of rollouts from SDF-Sim scaled to large simulations, all simulated for 200 steps. 300 shoes (object from Movi-C), with 851k nodes, falling onto the floor. See more examples with up to 1.1 million nodes in Figure \ref{['app:large_falling_pile']} and simulation videos on https://sites.google.com/view/sdf-sim.
  • Figure 4: Construction of graph edges in SDF-Sim.
  • Figure 5: Comparison of the last frames of rollouts predicted on Movi-C. See more frames in Figure \ref{['fig:rollouts']} and on the https://sites.google.com/view/sdf-sim.
  • Figure 6: Accuracy, memory and runtime comparisons between the SDF-Sim model and the mesh-based baselines on the Movi-B/C benchmarks. On Movi-C, most baselines except FIGNet* run out of memory and are not shown. As "Peak Memory" we report the peak memory used by the model per single step of the simulation. DPI, MGN-Large-Radius and MGN results were reported by allen2023fignet. See Tables \ref{['app:accuracy_comparison_movi_b']} and \ref{['app:accuracy_comparison_movi_c']} for the exact numbers.
  • ...and 14 more figures