Table of Contents
Fetching ...

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Keyon Vafa, Peter G. Chang, Ashesh Rambachan, Sendhil Mullainathan

TL;DR

Foundation models often predict sequences well without acquiring a coherent world model. The inductive bias probe framework measures how extrapolations align with a postulated world model using $R$-IB and $D$-IB, calibrated against an oracle. Across orbital mechanics, lattice, and Othello, models frequently fail to inherit Newtonian or domain-specific world models, instead showing heuristics or next-token-based biases, with force-signal analyses revealing non-universal laws. These findings suggest a practical framework for diagnosing and guiding the development of world-model-aligned inductive biases in foundation models.

Abstract

Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler's predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model's inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.

What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

TL;DR

Foundation models often predict sequences well without acquiring a coherent world model. The inductive bias probe framework measures how extrapolations align with a postulated world model using -IB and -IB, calibrated against an oracle. Across orbital mechanics, lattice, and Othello, models frequently fail to inherit Newtonian or domain-specific world models, instead showing heuristics or next-token-based biases, with force-signal analyses revealing non-universal laws. These findings suggest a practical framework for diagnosing and guiding the development of world-model-aligned inductive biases in foundation models.

Abstract

Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler's predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model's inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.

Paper Structure

This paper contains 19 sections, 10 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Each pair of panels illustrates the trajectory of a planet in the solar system and its gravitational force vectors, comparing the true Newtonian forces (left) to the predicted forces (right) from a transformer foundation model pretrained on orbital sequences and fine-tuned to predict forces. While the model excels at generating accurate predictions of planetary trajectories, it does not have an inductive bias toward true Newtonian mechanics; moreover, its force predictions recover a nonsensical force law, as revealed by symbolic regression.
  • Figure 2: An inductive bias probe measures whether a foundation model has an inductive bias toward a given world model. The probe involves repeatedly fitting a foundation model to small, synthetic datasets and comparing the functions it learns to the functions in the given world model.
  • Figure 3: An illustration of the inductive bias probe when the given world model has a finite state space. Each row represents a function and each column represents an input $x_i$, with inputs belonging to the same state grouped together. The shading illustrates each function's value at the corresponding input. A foundation model has low R-IB (middle) if it learns functions that divide states, while a foundation model has low D-IB (right) if it learns function that merge states.
  • Figure 4: Inductive bias probe performance (\ref{['eqn:indutive_bias_continuous']}) for a transformer pretrained on orbital trajectories. Values shown are the absolute value of the inductive bias metric. A 45-degree line would indicate perfect inductive bias toward an oracle that extrapolates based on the Newtonian state vector.
  • Figure 5: Inductive bias probe results (R-IB and D-IB) for the lattice problem as a function of the underlying number of states. A different model is pre-trained on data consistent with each number of states and its inductive bias for that state structure is recorded using the metrics in \ref{['sec:framework']}.
  • ...and 5 more figures