Learning rigid-body simulators over implicit shapes for large-scale scenes and vision
Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R. Allen, William F. Whitney, Kimberly Stachenfeld, Tobias Pfaff
TL;DR
SDF-Sim presents a scalable, learned rigid-body simulator that uses per-object SDFs to represent shapes and a novel SDF-based inter-object edge construction to reduce collision-edge complexity from quadratic to linear in the number of surface nodes. The method enables training-scale graph neural networks to simulate hundreds to over a million nodes, including real-world scenes derived from vision, by querying SDFs for efficient collision handling and closest-point computations. Through extensive experiments on Kubric Movi-B/C, large-scale sphere-in-bowl tests, and vision-derived scenes, the work demonstrates substantial memory and runtime benefits with competitive accuracy, and shows the potential to handle real-world scenes beyond mesh-based limits. Limitations include the need to train an SDF per shape, with future work aiming to amortize SDF training and extend to deformable or articulated objects, broadening applicability to animation, robotics, and VR.
Abstract
Simulating large scenes with many rigid objects is crucial for a variety of applications, such as robotics, engineering, film and video games. Rigid interactions are notoriously hard to model: small changes to the initial state or the simulation parameters can lead to large changes in the final state. Recently, learned simulators based on graph networks (GNNs) were developed as an alternative to hand-designed simulators like MuJoCo and PyBullet. They are able to accurately capture dynamics of real objects directly from real-world observations. However, current state-of-the-art learned simulators operate on meshes and scale poorly to scenes with many objects or detailed shapes. Here we present SDF-Sim, the first learned rigid-body simulator designed for scale. We use learned signed-distance functions (SDFs) to represent the object shapes and to speed up distance computation. We design the simulator to leverage SDFs and avoid the fundamental bottleneck of the previous simulators associated with collision detection. For the first time in literature, we demonstrate that we can scale the GNN-based simulators to scenes with hundreds of objects and up to 1.1 million nodes, where mesh-based approaches run out of memory. Finally, we show that SDF-Sim can be applied to real world scenes by extracting SDFs from multi-view images.
