Discovering Mechanistic Models of Neural Activity: System Identification in an in Silico Zebrafish

Jan-Matthis Lueckmann; Viren Jain; Michał Januszewski

Discovering Mechanistic Models of Neural Activity: System Identification in an in Silico Zebrafish

Jan-Matthis Lueckmann, Viren Jain, Michał Januszewski

TL;DR

The paper tackles verifiable discovery of neural mechanisms by using a fully in silico zebrafish testbed with a known ground-truth transition $f^*$ and horizon $H=256$ to benchmark mechanistic system identification. It combines a high-fidelity neuromechanical simulator (simZFish) with an automated LLM-guided tree search to evolve interpretable transition functions that predict neural activity in a visuomotor circuit. Results show that sensory drive is necessary for identifiability, while unconstrained tree-search approaches can achieve excellent in-distribution predictive accuracy but fail under novel stimuli; incorporating structural priors yields robust out-of-distribution generalization and faithful recovery of effective connectivity and impulse-response dynamics. The authors propose practical guidelines for real-world neural data analysis, emphasizing connectome-constrained forecasting, OOD evaluation, and the use of wiring as a structural scaffold to align AI-driven discovery with mechanistic neuroscience.

Abstract

Constructing mechanistic models of neural circuits is a fundamental goal of neuroscience, yet verifying such models is limited by the lack of ground truth. To rigorously test model discovery, we establish an in silico testbed using neuromechanical simulations of a larval zebrafish as a transparent ground truth. We find that LLM-based tree search autonomously discovers predictive models that significantly outperform established forecasting baselines. Conditioning on sensory drive is necessary but not sufficient for faithful system identification, as models exploit statistical shortcuts. Structural priors prove essential for enabling robust out-of-distribution generalization and recovery of interpretable mechanistic models. Our insights provide guidance for modeling real-world neural recordings and offer a broader template for AI-driven scientific discovery.

Discovering Mechanistic Models of Neural Activity: System Identification in an in Silico Zebrafish

TL;DR

The paper tackles verifiable discovery of neural mechanisms by using a fully in silico zebrafish testbed with a known ground-truth transition

and horizon

to benchmark mechanistic system identification. It combines a high-fidelity neuromechanical simulator (simZFish) with an automated LLM-guided tree search to evolve interpretable transition functions that predict neural activity in a visuomotor circuit. Results show that sensory drive is necessary for identifiability, while unconstrained tree-search approaches can achieve excellent in-distribution predictive accuracy but fail under novel stimuli; incorporating structural priors yields robust out-of-distribution generalization and faithful recovery of effective connectivity and impulse-response dynamics. The authors propose practical guidelines for real-world neural data analysis, emphasizing connectome-constrained forecasting, OOD evaluation, and the use of wiring as a structural scaffold to align AI-driven discovery with mechanistic neuroscience.

Abstract

Paper Structure (42 sections, 21 equations, 8 figures, 2 tables)

This paper contains 42 sections, 21 equations, 8 figures, 2 tables.

Introduction
Methods
Neuromechanical Simulations
Simulation Environment
Neural Circuit Model
Simulated Behavior
Dataset Generation
System Identification
Task Definition
Evaluation
Baseline Models
Tree Search
Data and Code
Results
Sensory Drive and Identifiability
...and 27 more sections

Figures (8)

Figure 1: Verifiable discovery of neural mechanisms in an in silico testbed.a. The simulation environment consists of a neuromechanical model subject to fluid dynamics, responding to visual stimuli driven by a neural circuit in a closed-loop setting. b. We use LLM-guided tree search to autonomously explore the space of dynamical models, evolving Python code to minimize predictive error on neural activity. c. Despite high predictive in-distribution performance, unconstrained black-box tree search models fail to identify the system's mechanisms, as revealed by effective connectivity matrices (excerpt shown; blue and red indicate inhibitory and excitatory interactions, respectively; color intensity represents magnitude). In contrast, a structure-constrained grey-box tree search model successfully identifies the correct signs and magnitudes closely matching the ground truth, from a structural prior that provides information about the existence and absence of connections.
Figure 2: Neural circuit model.a. The neural circuit model defines the information flow, processing retinal input through the early and late pretectum (ePT, lPT) to drive downstream command nuclei (nMLF, aHB) and motor circuits. b. Connectivity diagram of an example late pretectal neuron ($\text{oB}_\text{1}$), illustrating the specific excitatory and inhibitory wiring with ePT neurons resulting in its direction-selectivity.
Figure 3: Sensory information is a prerequisite for system identification. Performance (Test MAE, log scale) of baseline models and the gt circuit across prediction horizons. While models marked with s+h are conditioned on exogenous sensory drive, models with h only use past history. gth yields higher error than the naive meanh baseline, suggesting that the true solution is non-identifiable without sensory drive based on standard error metrics.
Figure 4: Tree search discovers SOTA predictive models. a. Progression of LLM-guided tree search, highlighting first and highest-scoring solutions ( ts001, ts422). For visual clarity, this tree is pruned to show only the ancestral path of ts422; the complete, unpruned search tree is in \ref{['fig:trees']}. b. Performance comparison of discovered models against human-curated baselines. The highest-scoring discovered tree-search architecture ( ts422) significantly outperforms baselines across all prediction horizons.
Figure 5: Structural priors enable robust generalization.a. Generalization analysis comparing performance on in-distribution (Test MAE) versus out-of-distribution stimuli (Holdout MAE). Tree search models (ts, purple) exhibit a significant generalization gap; while they achieve high accuracy on the test set, they fail to generalize to novel stimuli, suggesting they memorize sensory-motor correlations. Structurally-constrained tree search solutions (sts, green) cluster closer along the diagonal, indicating robust transfer to unseen environments. Dotted purple line marks the convex hull for ts solutions. b. Holdout performance across prediction horizons. The highest-scoring constrained model ( sts445) significantly outperforms its unconstrained counterpart ( ts422) and baselines.
...and 3 more figures

Discovering Mechanistic Models of Neural Activity: System Identification in an in Silico Zebrafish

TL;DR

Abstract

Discovering Mechanistic Models of Neural Activity: System Identification in an in Silico Zebrafish

Authors

TL;DR

Abstract

Table of Contents

Figures (8)