Graph networks as learnable physics engines for inference and control
Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia
TL;DR
This work introduces graph networks as learnable physics engines that represent physical systems with object-centric graphs and learn forward, inference, and control dynamics. By composing edge-, node-, and global-update functions, the GN framework enables accurate one-step and roll-out predictions, implicit system identification under partial observability, and gradient-based planning via model-predictive control and SVG-based learning. The approach generalizes across multiple parametrized and structurally varied systems, including a real JACO robot, and supports zero-shot generalization to unseen topologies while maintaining competitive performance with strong baselines. Differentiable, graph-based dynamics provide a scalable path toward robust, data-efficient model-based planning and reasoning in complex physical domains, with potential extensions to real-world control, sim-to-real transfer, and stochastic environments.
Abstract
Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differentiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.
