Table of Contents
Fetching ...

Emergence of Computational Structure in a Neural Network Physics Simulator

Rohan Hitchcock, Gary W. Delaney, Jonathan H. Manton, Richard Scalzo, Jingge Zhu

TL;DR

The paper investigates how interpretable computational structures emerge in a transformer-based neural network trained to simulate a particle system under gravity. By introducing collision-detection as a measurable head behavior and analyzing the attention-distance correlation alongside the local learning coefficient LLC, the authors show that collision-detection heads arise in conjunction with degenerate loss-landscape geometry and power-law dynamics, described via an effective potential. They draw parallels to second-order phase transitions to interpret these dynamics and discuss implications for convergence times and early training interventions. While offering a mechanistic view of emergent computation in this physics simulator, the study remains limited to a single model and calls for broader validation across architectures and tasks.

Abstract

Neural networks often have identifiable computational structures - components of the network which perform an interpretable algorithm or task - but the mechanisms by which these emerge and the best methods for detecting these structures are not well understood. In this paper we investigate the emergence of computational structure in a transformer-like model trained to simulate the physics of a particle system, where the transformer's attention mechanism is used to transfer information between particles. We show that (a) structures emerge in the attention heads of the transformer which learn to detect particle collisions, (b) the emergence of these structures is associated to degenerate geometry in the loss landscape, and (c) the dynamics of this emergence follows a power law. This suggests that these components are governed by a degenerate "effective potential". These results have implications for the convergence time of computational structure within neural networks and suggest that the emergence of computational structure can be detected by studying the dynamics of network components.

Emergence of Computational Structure in a Neural Network Physics Simulator

TL;DR

The paper investigates how interpretable computational structures emerge in a transformer-based neural network trained to simulate a particle system under gravity. By introducing collision-detection as a measurable head behavior and analyzing the attention-distance correlation alongside the local learning coefficient LLC, the authors show that collision-detection heads arise in conjunction with degenerate loss-landscape geometry and power-law dynamics, described via an effective potential. They draw parallels to second-order phase transitions to interpret these dynamics and discuss implications for convergence times and early training interventions. While offering a mechanistic view of emergent computation in this physics simulator, the study remains limited to a single model and calls for broader validation across architectures and tasks.

Abstract

Neural networks often have identifiable computational structures - components of the network which perform an interpretable algorithm or task - but the mechanisms by which these emerge and the best methods for detecting these structures are not well understood. In this paper we investigate the emergence of computational structure in a transformer-like model trained to simulate the physics of a particle system, where the transformer's attention mechanism is used to transfer information between particles. We show that (a) structures emerge in the attention heads of the transformer which learn to detect particle collisions, (b) the emergence of these structures is associated to degenerate geometry in the loss landscape, and (c) the dynamics of this emergence follows a power law. This suggests that these components are governed by a degenerate "effective potential". These results have implications for the convergence time of computational structure within neural networks and suggest that the emergence of computational structure can be detected by studying the dynamics of network components.

Paper Structure

This paper contains 39 sections, 3 theorems, 39 equations, 35 figures, 5 tables.

Key Result

Theorem A.1

There exist constants $\theta \in [\tfrac{1}{2}, 1)$ and $C, C' > 0$ and an open neighbourhood $V$ of $\boldsymbol{x}^*$ such that: where $d(\boldsymbol{x}, Z) = \inf \{\|\boldsymbol{x} - \boldsymbol{y}\| \mid \boldsymbol{y} \in Z\}$ is the distance between $\boldsymbol{x}$ and $Z$.

Figures (35)

  • Figure 1: In collision detection heads we see three simultaneous phenomena: collision detection behaviour emerges (bottom), the attention-distance correlation curve enters a power law regime (top, blue) and the local learning coefficient plateaus (top, green). In this head, we see a transition between two distinct phases of collision detection. The behaviour in the first collision detection phase is characterised by collision detection between particles below a certain $y$-level (bottom, B), as compared to mostly universal collision detection in the second phase (bottom, C). This transition is reflected in the attention-distance correlation curve as different power law exponents, visualised as different slopes on the log-log plot (top, blue). The local learning coefficient plateaus during the first collision detection phase before continuing up to its final value.
  • Figure 2: We track the collision detection score for all attention heads over training across all five training runs and aggregate them into a two-dimensional histogram. At the end of training we see a dense cluster of attention heads with collision detection score near $0.95$ and a dense cluster near zero, corresponding respectively to true collision detection heads and heads without collision detection behaviour. Between the two large clusters there are several small clusters corresponding to the partial collision detection heads. To improve the dynamic range of this plot, all density values in the 95th percentile are set to the same maximal colour intensity. Versions of this plot showing only the attention heads from a single training run are given in \ref{['sec:additional_contact_score_plots']}
  • Figure 3: From left to right, examples of a true contact detection head, a partial collision detection head, and a head which is neither. These have collision detection scores 0.95, 0.78 and 0.02 respectively. We show the heads' behaviour when the particles are in free-fall (top) and when the particles are close to settled (bottom). The attention score assigned by particle $i$ to particle $j$ is indicated by a red line with opacity proportional to the attention score.
  • Figure 4: The model architecture. Layers which operate on particle embeddings in-parallel are shaded in blue. Layers which mix information between particle embeddings are shaded in green. Data is shaded in red. Operations without any trainable parameters are left unshaded.
  • Figure 5: The number of true collision detection heads and partial collision detection heads within each transformer block at the end of training. There are eight attention heads in each block. Data is shown for all five training runs, each as a separate bar.
  • ...and 30 more figures

Theorems & Definitions (7)

  • Theorem A.1: Łojasiewicz inequalities
  • Definition A.2: Łojasiewicz exponent
  • Corollary A.3
  • proof
  • Corollary A.4
  • proof
  • Example A.5: see strogatzNonlinearDynamicsChaos2015 strogatzNonlinearDynamicsChaos2015, Chapter 3.4