Table of Contents
Fetching ...

Learning to Theorize the World from Observation

Doojin Baek, Gyubin Lee, Junyeob Baek, Hosung Lee, Sungjin Ahn

Abstract

What does it mean to understand the world? Contemporary world models often operationalize understanding as accurate future prediction in latent or observation space. Developmental cognitive science, however, suggests a different view: human understanding emerges through the construction of internal theories of how the world works, even before mature language is acquired. Inspired by this theory-building view of cognition, we introduce Learning-to-Theorize, a learning paradigm for inferring explicit explanatory theories of the world from raw, non-textual observations. We instantiate this paradigm with the Neural Theorizer (NEO), a probabilistic neural model that induces latent programs as a learned Language of Thought and executes them through a shared transition model. In NEO, a theory is represented as an executable, compositional program whose learned primitives can be systematically recombined to explain novel phenomena. Experiments show that this formulation enables explanation-driven generalization, allowing observations to be understood in terms of the programs that generate them.

Learning to Theorize the World from Observation

Abstract

What does it mean to understand the world? Contemporary world models often operationalize understanding as accurate future prediction in latent or observation space. Developmental cognitive science, however, suggests a different view: human understanding emerges through the construction of internal theories of how the world works, even before mature language is acquired. Inspired by this theory-building view of cognition, we introduce Learning-to-Theorize, a learning paradigm for inferring explicit explanatory theories of the world from raw, non-textual observations. We instantiate this paradigm with the Neural Theorizer (NEO), a probabilistic neural model that induces latent programs as a learned Language of Thought and executes them through a shared transition model. In NEO, a theory is represented as an executable, compositional program whose learned primitives can be systematically recombined to explain novel phenomena. Experiments show that this formulation enables explanation-driven generalization, allowing observations to be understood in terms of the programs that generate them.

Paper Structure

This paper contains 85 sections, 14 equations, 22 figures, 27 tables, 3 algorithms.

Figures (22)

  • Figure 1: Learning to Theorize (L2T) Framework.(a) Training data consists of observation pairs $(x,y)$ generated by unobserved true programs. (b) Under L2T, the model learns to discover reusable primitives (Rotate, Left, Down, and Paint) and to compose them into executable theories. (c) Without L2T, the model instead memorizes entangled composite primitives (e.g., Left-Down) as indecomposed single units. (d) Once the model has learned to theorize, novel phenomena (e.g., Down-Paint-Rotate) can be explained by recombining learned primitives. (e) In contrast, memorized entangled representations fail to generalize to unseen programs.
  • Figure 2: Computation graph of Neural Theorizer (NEO). NEO infers a latent program by iteratively selecting a primitive $z_{ik}$ with the theory programmer $q_\phi(z_{ik} \mid s_k, y)$ and executing it via the transition model $p_\theta(s_{k+1} \mid s_k, z_{ik})$. Each intermediate state $s_k$ is decoded into a full reconstruction $\hat{y}_k = D_\theta(s_k)$; through state grounding (Sec. \ref{['sec:state_grounding']}), these intermediate predictions are explicitly regularized to remain valid observations, preventing degenerate or blurry intermediate states. The MDL criterion selects the shortest accurate explanation length $k^*$ (green), which in turn provides a learning signal that favors short yet accurate program compositions (Sec. \ref{['sec:mdl']}).
  • Figure 3: Comparison of image-editing performance across $\alpha$-controlled dataset complexity and OOD settings, including length OOD. NEO consistently outperforms baselines across all $\alpha$-controlled OOD regimes and length OOD, for both self-explainability and transferability, as measured by the $\ell_1$ distance between the predicted image $\hat{y}$ and the ground-truth target $y$ (lower is better).
  • Figure 4: Visualization of explanations for a compositional OOD in the image-editing task ($\alpha = 0.66$). The leftmost column shows the observed source–target pair $(x, y)$. Baseline models generate $y$ via a single-step prediction or by relying on action combinations observed only in the in-distribution data, and thus fail to decompose the novel OOD transformation. In contrast, NEO explains the same phenomenon as a sequence of learned primitive actions, enabling systematic OOD generalization through explicit compositional explanations.
  • Figure 5: Visualization of instance-wise program length selection under the MDL principle. For each instance, the model selects an optimal program length $k^*$ that aligns with the ground-truth number of underlying transitions, demonstrating adaptive explanation length rather than a fixed horizon. In addition, the selected programs recover semantically correct action sequences; see Sec. \ref{['subsec:prim_info']} for details on primitive definitions.
  • ...and 17 more figures