Table of Contents
Fetching ...

The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum

Brennen Hill

TL;DR

This work investigates how internal world models can form in living human neural organoids by embedding them in three progressively complex virtual environments. It combines a biologically grounded MEA interface with predictive coding and model-based reinforcement learning to drive learning, and introduces a multimodal evaluation that links behavioral adaptation to synaptic plasticity. A key innovation is using a Large Language Model to autonomously design and optimize experimental curricula, enabling scalable exploration of environment-agent interactions. The framework offers a biologically grounded platform to study embodiment, decision-making, and the physical basis of intelligence, bridging computational neuroscience with organoid-based learning.

Abstract

The capacity of an embodied agent to understand, predict, and interact with its environment is fundamentally contingent on an internal world model. This paper introduces a novel framework for investigating the formation and adaptation of such world models within a biological substrate: human neural organoids. We present a curriculum of three scalable, closed-loop virtual environments designed to train these biological agents and probe the underlying synaptic mechanisms of learning, such as long-term potentiation (LTP) and long-term depression (LTD). We detail the design of three distinct task environments that demand progressively more sophisticated world models for successful decision-making: (1) a conditional avoidance task for learning static state-action contingencies, (2) a one-dimensional predator-prey scenario for goal-directed interaction, and (3) a replication of the classic Pong game for modeling dynamic, continuous-time systems. For each environment, we formalize the state and action spaces, the sensory encoding and motor decoding mechanisms, and the feedback protocols based on predictable (reward) and unpredictable (punishment) stimulation, which serve to drive model refinement. In a significant methodological advance, we propose a meta-learning approach where a Large Language Model automates the generative design and optimization of experimental protocols, thereby scaling the process of environment and curriculum design. Finally, we outline a multi-modal evaluation strategy that moves beyond task performance to directly measure the physical correlates of the learned world model by quantifying synaptic plasticity at electrophysiological, cellular, and molecular levels. This work bridges the gap between model-based reinforcement learning and computational neuroscience, offering a unique platform for studying embodiment, decision-making, and the physical basis of intelligence.

The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum

TL;DR

This work investigates how internal world models can form in living human neural organoids by embedding them in three progressively complex virtual environments. It combines a biologically grounded MEA interface with predictive coding and model-based reinforcement learning to drive learning, and introduces a multimodal evaluation that links behavioral adaptation to synaptic plasticity. A key innovation is using a Large Language Model to autonomously design and optimize experimental curricula, enabling scalable exploration of environment-agent interactions. The framework offers a biologically grounded platform to study embodiment, decision-making, and the physical basis of intelligence, bridging computational neuroscience with organoid-based learning.

Abstract

The capacity of an embodied agent to understand, predict, and interact with its environment is fundamentally contingent on an internal world model. This paper introduces a novel framework for investigating the formation and adaptation of such world models within a biological substrate: human neural organoids. We present a curriculum of three scalable, closed-loop virtual environments designed to train these biological agents and probe the underlying synaptic mechanisms of learning, such as long-term potentiation (LTP) and long-term depression (LTD). We detail the design of three distinct task environments that demand progressively more sophisticated world models for successful decision-making: (1) a conditional avoidance task for learning static state-action contingencies, (2) a one-dimensional predator-prey scenario for goal-directed interaction, and (3) a replication of the classic Pong game for modeling dynamic, continuous-time systems. For each environment, we formalize the state and action spaces, the sensory encoding and motor decoding mechanisms, and the feedback protocols based on predictable (reward) and unpredictable (punishment) stimulation, which serve to drive model refinement. In a significant methodological advance, we propose a meta-learning approach where a Large Language Model automates the generative design and optimization of experimental protocols, thereby scaling the process of environment and curriculum design. Finally, we outline a multi-modal evaluation strategy that moves beyond task performance to directly measure the physical correlates of the learned world model by quantifying synaptic plasticity at electrophysiological, cellular, and molecular levels. This work bridges the gap between model-based reinforcement learning and computational neuroscience, offering a unique platform for studying embodiment, decision-making, and the physical basis of intelligence.

Paper Structure

This paper contains 27 sections, 3 figures.

Figures (3)

  • Figure 1: Diagram of the Conditional Avoidance environment. The agent, a neural organoid, is embodied in a 1D world. Its decision to move left or right is decoded from neural activity. Entering the aversive zone triggers unpredictable punishment, driving the agent to learn a world model where that region is associated with negative outcomes.
  • Figure 2: Diagram of the 1D Predator-Prey environment. The agent (predator) must interpret sensory information about its own location and the prey's location to decide on an action. A successful capture is rewarded, driving the agent to build a world model that supports goal-directed navigation.
  • Figure 3: The Pong environment. The organoid receives complex, multi-modal sensory input about the ball's position (rate-coded distance, spatially-coded location) and must learn a predictive world model of the ball's trajectory to control its paddle and achieve a successful interception.