The Physical Basis of Prediction: World Model Formation in Neural Organoids via an LLM-Generated Curriculum
Brennen Hill
TL;DR
This work investigates how internal world models can form in living human neural organoids by embedding them in three progressively complex virtual environments. It combines a biologically grounded MEA interface with predictive coding and model-based reinforcement learning to drive learning, and introduces a multimodal evaluation that links behavioral adaptation to synaptic plasticity. A key innovation is using a Large Language Model to autonomously design and optimize experimental curricula, enabling scalable exploration of environment-agent interactions. The framework offers a biologically grounded platform to study embodiment, decision-making, and the physical basis of intelligence, bridging computational neuroscience with organoid-based learning.
Abstract
The capacity of an embodied agent to understand, predict, and interact with its environment is fundamentally contingent on an internal world model. This paper introduces a novel framework for investigating the formation and adaptation of such world models within a biological substrate: human neural organoids. We present a curriculum of three scalable, closed-loop virtual environments designed to train these biological agents and probe the underlying synaptic mechanisms of learning, such as long-term potentiation (LTP) and long-term depression (LTD). We detail the design of three distinct task environments that demand progressively more sophisticated world models for successful decision-making: (1) a conditional avoidance task for learning static state-action contingencies, (2) a one-dimensional predator-prey scenario for goal-directed interaction, and (3) a replication of the classic Pong game for modeling dynamic, continuous-time systems. For each environment, we formalize the state and action spaces, the sensory encoding and motor decoding mechanisms, and the feedback protocols based on predictable (reward) and unpredictable (punishment) stimulation, which serve to drive model refinement. In a significant methodological advance, we propose a meta-learning approach where a Large Language Model automates the generative design and optimization of experimental protocols, thereby scaling the process of environment and curriculum design. Finally, we outline a multi-modal evaluation strategy that moves beyond task performance to directly measure the physical correlates of the learned world model by quantifying synaptic plasticity at electrophysiological, cellular, and molecular levels. This work bridges the gap between model-based reinforcement learning and computational neuroscience, offering a unique platform for studying embodiment, decision-making, and the physical basis of intelligence.
