Table of Contents
Fetching ...

Neural Fields as World Models

Joshua Nunley

TL;DR

These findings suggest intuitive physics and body schema may share a common origin in spatially structured neural dynamics, and are implemented using neural fields with motor-gated channels with motor-gated channels.

Abstract

How does the brain predict physical outcomes while acting in the world? Machine learning world models compress visual input into latent spaces, discarding the spatial structure that characterizes sensory cortex. We propose isomorphic world models: architectures preserving sensory topology so that physics prediction becomes geometric propagation rather than abstract state transition. We implement this using neural fields with motor-gated channels, where activity evolves through local lateral connectivity and motor commands multiplicatively modulate specific populations. Three experiments support this approach: (1) local connectivity is sufficient to learn ballistic physics, with predictions traversing intermediate locations rather than "teleporting"; (2) policies trained entirely in imagination transfer to real physics at nearly twice the rate of latent-space alternatives; and (3) motor-gated channels spontaneously develop body-selective encoding through visuomotor prediction alone. These findings suggest intuitive physics and body schema may share a common origin in spatially structured neural dynamics.

Neural Fields as World Models

TL;DR

These findings suggest intuitive physics and body schema may share a common origin in spatially structured neural dynamics, and are implemented using neural fields with motor-gated channels with motor-gated channels.

Abstract

How does the brain predict physical outcomes while acting in the world? Machine learning world models compress visual input into latent spaces, discarding the spatial structure that characterizes sensory cortex. We propose isomorphic world models: architectures preserving sensory topology so that physics prediction becomes geometric propagation rather than abstract state transition. We implement this using neural fields with motor-gated channels, where activity evolves through local lateral connectivity and motor commands multiplicatively modulate specific populations. Three experiments support this approach: (1) local connectivity is sufficient to learn ballistic physics, with predictions traversing intermediate locations rather than "teleporting"; (2) policies trained entirely in imagination transfer to real physics at nearly twice the rate of latent-space alternatives; and (3) motor-gated channels spontaneously develop body-selective encoding through visuomotor prediction alone. These findings suggest intuitive physics and body schema may share a common origin in spatially structured neural dynamics.
Paper Structure (11 sections, 3 equations, 6 figures)

This paper contains 11 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: Neural field world model architecture. Visual input $I_t$ is added to the hidden state only during the first three timesteps of each sequence; thereafter, the field evolves autonomously through local lateral convolution ($K$) and recurrence. Motor commands $\mathbf{m}$ multiplicatively gate specific channels, implementing gain modulation. A 1x1 convolution reconstructs the visual prediction $\hat{I}_{t+1}$.
  • Figure 2: Trajectory prediction through internal dynamics. Each column shows a different ballistic trajectory (50--59 timesteps). Both models observe only the first three frames before predicting the remainder without visual input. The neural field (green) maintains smooth parabolic arcs closely tracking ground truth (gray dotted), while the VAE-LSTM (orange) exhibits erratic oscillations.
  • Figure 3: Visuomotor prediction in the arm catching task. Both models receive motor commands throughout, but visual input only during the first three frames (Obs). Ground truth shown as gray dots. The vertical dashed line marks the transition to blind prediction.
  • Figure 4: Dream training transfers to real physics. Lines show loss during dream training (right axis; plotted as catch predictor value, i.e., negated loss, so higher = better; ceiling $\approx$60% since the ball reaches catching distance $\approx$40% through each episode). Points show real catch rate at evaluation (left axis). Neural field policies (green) achieve 81.5%, approaching the physics baseline (dotted). VAE-LSTM (orange) achieves comparable dream performance but shows larger sim-to-real gap.
  • Figure 5: Training dynamics. Neural field (green) achieves lower final loss than VAE-LSTM (orange) on the ballistic task and comparable loss on the arm task, despite using 17--67$\times$ fewer parameters. Shaded regions show IQR across seeds.
  • ...and 1 more figures