Table of Contents
Fetching ...

From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers

Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias

TL;DR

The paper investigates whether general-purpose transformers can learn true world models governing planetary motion, not just predictive accuracy. It identifies three minimal inductive biases—spatial smoothness via continuous regression (or small vocabulary), spatial stability through noisy-context training, and temporal locality via restricted attention—that steer learning toward mechanistic dynamics. The results show that spatial smoothness can enable a coherent spatial map, noisy-context regression mitigates error accumulation, and temporal locality shifts the learned dynamics from Keplerian curve-fitting to Newtonian force-based representations; context length controls whether a Newtonian or Keplerian model emerges. This work demonstrates that simple architectural biases can convert a predictor into a scientific reasoner, advancing automated discovery of physical laws in AI systems.

Abstract

Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on "world models" -- causal abstractions that allow an agent to not only predict future states but understand the underlying governing dynamics. While previous "AI Physicist" approaches have successfully recovered such laws, they typically rely on strong, domain-specific priors that effectively "bake in" the physics. Conversely, Vafa et al. recently showed that generic Transformers fail to acquire these world models, achieving high predictive accuracy without capturing the underlying physical laws. We bridge this gap by systematically introducing three minimal inductive biases. We show that ensuring spatial smoothness (by formulating prediction as continuous regression) and stability (by training with noisy contexts to mitigate error accumulation) enables generic Transformers to surpass prior failures and learn a coherent Keplerian world model, successfully fitting ellipses to planetary trajectories. However, true physical insight requires a third bias: temporal locality. By restricting the attention window to the immediate past -- imposing the simple assumption that future states depend only on the local state rather than a complex history -- we force the model to abandon curve-fitting and discover Newtonian force representations. Our results demonstrate that simple architectural choices determine whether an AI becomes a curve-fitter or a physicist, marking a critical step toward automated scientific discovery.

From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers

TL;DR

The paper investigates whether general-purpose transformers can learn true world models governing planetary motion, not just predictive accuracy. It identifies three minimal inductive biases—spatial smoothness via continuous regression (or small vocabulary), spatial stability through noisy-context training, and temporal locality via restricted attention—that steer learning toward mechanistic dynamics. The results show that spatial smoothness can enable a coherent spatial map, noisy-context regression mitigates error accumulation, and temporal locality shifts the learned dynamics from Keplerian curve-fitting to Newtonian force-based representations; context length controls whether a Newtonian or Keplerian model emerges. This work demonstrates that simple architectural biases can convert a predictor into a scientific reasoner, advancing automated discovery of physical laws in AI systems.

Abstract

Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on "world models" -- causal abstractions that allow an agent to not only predict future states but understand the underlying governing dynamics. While previous "AI Physicist" approaches have successfully recovered such laws, they typically rely on strong, domain-specific priors that effectively "bake in" the physics. Conversely, Vafa et al. recently showed that generic Transformers fail to acquire these world models, achieving high predictive accuracy without capturing the underlying physical laws. We bridge this gap by systematically introducing three minimal inductive biases. We show that ensuring spatial smoothness (by formulating prediction as continuous regression) and stability (by training with noisy contexts to mitigate error accumulation) enables generic Transformers to surpass prior failures and learn a coherent Keplerian world model, successfully fitting ellipses to planetary trajectories. However, true physical insight requires a third bias: temporal locality. By restricting the attention window to the immediate past -- imposing the simple assumption that future states depend only on the local state rather than a complex history -- we force the model to abandon curve-fitting and discover Newtonian force representations. Our results demonstrate that simple architectural choices determine whether an AI becomes a curve-fitter or a physicist, marking a critical step toward automated scientific discovery.
Paper Structure (18 sections, 4 equations, 12 figures)

This paper contains 18 sections, 4 equations, 12 figures.

Figures (12)

  • Figure 1: Visual abstract. Top left: The problem setup of vafa2025has: planetary motion prediction is formulated as next token(s) prediction. Bottom left: Inductive biases are key to learning Newtonian world models. Three inductive biases are identified and used to fix respective failure modes. Right: The context length controls the world model learned by transformers. Long context lengths lead to the Keplerian model (global, geometry-based), while small context lengths lead to the Newtonian model (local, force-based).
  • Figure 2: Analyzing the embeddings of the transformer model used in vafa2025has. (a) Illustration of training dynamics of token embeddings: embeddings are randomly initialized (left), gradually gain spatial structure during training (middle), requiring substantial compute and data to reach true spatial map (right). (b) The learned embeddings exhibit poor locality: circular structures in the true coordinate space (left) fragment into four point clouds, losing fine-grained structure within each quadrant (right). (c) Learned embeddings show poor linear decodability to the true spatial map (left for $x$, right for $y$).
  • Figure 3: Spatial map emergence strongly depends on tokenization, and weakly on embedding dimensions. (a) Evolution of learned embeddings, i.e., token embeddings projected onto the best linearly decodable direction at the last step (200). From left to right: vocabulary size $V=100, 1000, 10000$. Each inset shows the true coordinate and the learned coordinate. The spatial map emerges easily for a small vocabulary size $V$, but becomes poorly emergent for large $V$. (b) Spatial map quality, measured by $R^2$ between the true coordinate and the learned coordinate. Left: $1-R^2$ obeys a scaling law with respect to vocabulary size $V$ and training tokens $D$. Middle: $R^2$ saturates when the embedding dimension $N$ is beyond a critical value at 8. Right: Scaling up $V$ or $N$ does not improve, but actually harms spatial map emergence.
  • Figure 4: Error accumulation and fixing it by adding context noise in training. Each subplot shows the ground truth trajectory (blue solid circle), conditioning 50 points (green), and the generated 50 points (red). From left to right: training with different levels of context noise $\sigma$. Naively training a regression-based transformer leads to severe error accumulation (left, $\sigma=0$), whereas adding a reasonable amount of noise $\sigma$ (e.g., $\sigma=0.1$) to contexts during training substantially improves robustness.
  • Figure 5: Comparing regression and classification transformers ($D$: training tokens), using mean distance error as the metric to evaluate predictive performance. Left: regression models exhibit a sweet spot in the context noise scale $\sigma$. Middle: regression models also exhibit a sweet spot in the vocabulary size $V$. Right: comparing regression and classification across different training data sizes $D$. Regression models consistently outperform classification models when their best hyperparameters ($\sigma$ or $V$) are selected. However, naively trained regression models ($\sigma=0$) underperform the best classification models when the training data is large.
  • ...and 7 more figures