Table of Contents
Fetching ...

Graph networks as learnable physics engines for inference and control

Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia

TL;DR

This work introduces graph networks as learnable physics engines that represent physical systems with object-centric graphs and learn forward, inference, and control dynamics. By composing edge-, node-, and global-update functions, the GN framework enables accurate one-step and roll-out predictions, implicit system identification under partial observability, and gradient-based planning via model-predictive control and SVG-based learning. The approach generalizes across multiple parametrized and structurally varied systems, including a real JACO robot, and supports zero-shot generalization to unseen topologies while maintaining competitive performance with strong baselines. Differentiable, graph-based dynamics provide a scalable path toward robust, data-efficient model-based planning and reasoning in complex physical domains, with potential extensions to real-world control, sim-to-real transfer, and stochastic environments.

Abstract

Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differentiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.

Graph networks as learnable physics engines for inference and control

TL;DR

This work introduces graph networks as learnable physics engines that represent physical systems with object-centric graphs and learn forward, inference, and control dynamics. By composing edge-, node-, and global-update functions, the GN framework enables accurate one-step and roll-out predictions, implicit system identification under partial observability, and gradient-based planning via model-predictive control and SVG-based learning. The approach generalizes across multiple parametrized and structurally varied systems, including a real JACO robot, and supports zero-shot generalization to unseen topologies while maintaining competitive performance with strong baselines. Differentiable, graph-based dynamics provide a scalable path toward robust, data-efficient model-based planning and reasoning in complex physical domains, with potential extensions to real-world control, sim-to-real transfer, and stochastic environments.

Abstract

Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differentiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.

Paper Structure

This paper contains 61 sections, 6 equations, 16 figures, 2 tables, 6 algorithms.

Figures (16)

  • Figure 1: (Top) Our experimental physical systems. (Bottom) Samples of parametrized versions of these systems (see videos: \environmentrandomtrajectories).
  • Figure 2: Graph representations and GN-based models. (a) A physical system's bodies and joints can be represented by a graph's nodes and edges, respectively. (b) A GN block takes a graph as input and returns a graph with the same structure but different edge, node, and global features as output (see Algorithm \ref{['alg:graph_network']}). (c) A feed-forward GN-based forward model for learning one-step predictions. (d) A recurrent GN-based forward model. (e) A recurrent GN-based inference model for system identification.
  • Figure 3: Evaluation rollout in a Swimmer6. Trajectory videos are here: \rolloutsswimmer. (a) Frames of ground truth and predicted states over a 100 step trajectory. (b-e) State sequence predictions for link #3 of the Swimmer. The subplots are (b) $x,y,z$-position, (c) $q0,q1,q2,q3$-quaternion orientation, (d) $x,y,z$-linear velocity, and (e) $x,y,z$-angular velocity. [au] indicates arbitrary units.
  • Figure 4: (a) One-step and (b) 100-step rollout errors for different models and training (different bars) on different test data (x-axis labels), relative to the constant prediction baseline (black dashed line). Blue bars are GN models trained on single systems. Red and yellow bars are GN models trained on multiple systems, with (yellow) and without (red) parametric variation. Note that including Cheetah in multiple system training caused performance to diminish (light red vs dark red bars), which suggests sharing might not always be beneficial.
  • Figure 5: Prediction errors, on (a) one-step and (b) 20-step evaluations, between the best MLP baseline and the best GN model after 72 hours of training. Swimmer6 prediction errors, on (c) one-step and (d) 20-step evaluations, between the best MLP baseline and the best GN model for data in the training set (dark), data in the validation set (medium), and data from DDPG agent trajectories (light). The numbers above the bars indicate the ratio between the corresponding generalization test error (medium or light) and the training error (dark).
  • ...and 11 more figures