From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers
Ziming Liu, Sophia Sanborn, Surya Ganguli, Andreas Tolias
TL;DR
The paper investigates whether general-purpose transformers can learn true world models governing planetary motion, not just predictive accuracy. It identifies three minimal inductive biases—spatial smoothness via continuous regression (or small vocabulary), spatial stability through noisy-context training, and temporal locality via restricted attention—that steer learning toward mechanistic dynamics. The results show that spatial smoothness can enable a coherent spatial map, noisy-context regression mitigates error accumulation, and temporal locality shifts the learned dynamics from Keplerian curve-fitting to Newtonian force-based representations; context length controls whether a Newtonian or Keplerian model emerges. This work demonstrates that simple architectural biases can convert a predictor into a scientific reasoner, advancing automated discovery of physical laws in AI systems.
Abstract
Can general-purpose AI architectures go beyond prediction to discover the physical laws governing the universe? True intelligence relies on "world models" -- causal abstractions that allow an agent to not only predict future states but understand the underlying governing dynamics. While previous "AI Physicist" approaches have successfully recovered such laws, they typically rely on strong, domain-specific priors that effectively "bake in" the physics. Conversely, Vafa et al. recently showed that generic Transformers fail to acquire these world models, achieving high predictive accuracy without capturing the underlying physical laws. We bridge this gap by systematically introducing three minimal inductive biases. We show that ensuring spatial smoothness (by formulating prediction as continuous regression) and stability (by training with noisy contexts to mitigate error accumulation) enables generic Transformers to surpass prior failures and learn a coherent Keplerian world model, successfully fitting ellipses to planetary trajectories. However, true physical insight requires a third bias: temporal locality. By restricting the attention window to the immediate past -- imposing the simple assumption that future states depend only on the local state rather than a complex history -- we force the model to abandon curve-fitting and discover Newtonian force representations. Our results demonstrate that simple architectural choices determine whether an AI becomes a curve-fitter or a physicist, marking a critical step toward automated scientific discovery.
