Table of Contents
Fetching ...

Interaction Networks for Learning about Objects, Relations and Physics

Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Rezende, Koray Kavukcuoglu

TL;DR

The paper addresses the challenge of reasoning about objects, relations, and physics in complex systems by introducing Interaction Networks (IN), a graph-based framework that separates relation-centric and object-centric processing to predict dynamics and infer abstract properties. IN combines structured knowledge, simulation-like dynamics, and deep learning, enabling accurate multi-step trajectory prediction and energy estimation while generalizing to systems with varying numbers and configurations of objects and relations. The authors demonstrate strong predictive performance across n-body, bouncing-ball, and string-spring domains, and show that IN can roll out thousands of steps with coherent behavior. This work advances AI toward a generalizable, differentiable physics engine capable of versatile reasoning in real-world domains and beyond.

Abstract

Reasoning about objects, relations, and physics is central to human intelligence, and a key goal of artificial intelligence. Here we introduce the interaction network, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. Our model takes graphs as input, performs object- and relation-centric reasoning in a way that is analogous to a simulation, and is implemented using deep neural networks. We evaluate its ability to reason about several challenging physical domains: n-body problems, rigid-body collision, and non-rigid dynamics. Our results show it can be trained to accurately simulate the physical trajectories of dozens of objects over thousands of time steps, estimate abstract quantities such as energy, and generalize automatically to systems with different numbers and configurations of objects and relations. Our interaction network implementation is the first general-purpose, learnable physics engine, and a powerful general framework for reasoning about object and relations in a wide variety of complex real-world domains.

Interaction Networks for Learning about Objects, Relations and Physics

TL;DR

The paper addresses the challenge of reasoning about objects, relations, and physics in complex systems by introducing Interaction Networks (IN), a graph-based framework that separates relation-centric and object-centric processing to predict dynamics and infer abstract properties. IN combines structured knowledge, simulation-like dynamics, and deep learning, enabling accurate multi-step trajectory prediction and energy estimation while generalizing to systems with varying numbers and configurations of objects and relations. The authors demonstrate strong predictive performance across n-body, bouncing-ball, and string-spring domains, and show that IN can roll out thousands of steps with coherent behavior. This work advances AI toward a generalizable, differentiable physics engine capable of versatile reasoning in real-world domains and beyond.

Abstract

Reasoning about objects, relations, and physics is central to human intelligence, and a key goal of artificial intelligence. Here we introduce the interaction network, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. Our model takes graphs as input, performs object- and relation-centric reasoning in a way that is analogous to a simulation, and is implemented using deep neural networks. We evaluate its ability to reason about several challenging physical domains: n-body problems, rigid-body collision, and non-rigid dynamics. Our results show it can be trained to accurately simulate the physical trajectories of dozens of objects over thousands of time steps, estimate abstract quantities such as energy, and generalize automatically to systems with different numbers and configurations of objects and relations. Our interaction network implementation is the first general-purpose, learnable physics engine, and a powerful general framework for reasoning about object and relations in a wide variety of complex real-world domains.

Paper Structure

This paper contains 21 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Schematic of an interaction network. a. For physical reasoning, the model takes objects and relations as input, reasons about their interactions, and applies the effects and physical dynamics to predict new states. b. For more complex systems, the model takes as input a graph that represents a system of objects, $o_j$, and relations, $\langle i,j,r_k \rangle_k$, instantiates the pairwise interaction terms, $b_k$, and computes their effects, $e_k$, via a relational model, $f_R(\cdot)$. The $e_k$ are then aggregated and combined with the $o_j$ and external effects, $x_j$, to generate input (as $c_j$), for an object model, $f_O(\cdot)$, which predicts how the interactions and dynamics influence the objects, $p$.
  • Figure 2: Prediction rollouts. Each column contains three panels of three video frames (with motion blur), each spanning 1000 rollout steps. Columns 1-2 are ground truth and model predictions for n-body systems, 3-4 are bouncing balls, and 5-6 are strings. Each model column was generated by a single model, trained on the underlying states of a system of the size in the top panel. The middle and bottom panels show its generalization to systems of different sizes and structure. For n-body, the training was on 6 bodies, and generalization was to 3 and 12 bodies. For balls, the training was on 6 balls, and generalization was to 3 and 9 balls. For strings, the training was on 15 masses with 1 end pinned, and generalization was to 30 masses with 0 and 2 ends pinned. The URLs to the full videos of each rollout are in Table \ref{['tab:videos']}.
  • Figure 3: Prediction experiment accuracy and generalization. Each colored bar represents the MSE between a model's predicted velocity and the ground truth physics engine's (the y-axes are log-scaled). Sublots (a-c) show n-body performance, (d-f) show balls, and (g-k) show string. The leftmost subplots in each (a, d, g) for each domain compare the constant velocity model (black), baseline MLP (grey), dynamics-only IN (red), and full IN (blue). The other panels show the IN's generalization performance to different numbers and configurations of objects, as indicated by the subplot titles. For the string systems, the numbers correspond to: (the number of masses, how many ends were pinned).