Table of Contents
Fetching ...

ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish

Jan-Matthis Lueckmann, Alexander Immer, Alex Bo-Yuan Chen, Peter H. Li, Mariela D. Petkova, Nirmala A. Iyer, Luuk Willem Hesselink, Aparna Dev, Gudrun Ihrke, Woohyun Park, Alyson Petruncio, Aubrey Weigel, Wyatt Korff, Florian Engert, Jeff W. Lichtman, Misha B. Ahrens, Michał Januszewski, Viren Jain

TL;DR

ZAPBench establishes a rigorous benchmark for predicting whole-brain neural activity at cellular resolution in a vertebrate, using a 4D light-sheet zebrafish dataset and an evolving connectome. It formalizes horizon-based forecasting with $H=32$ steps and context lengths $C=4$ or $256$, supporting both time-series traces and volumetric video inputs, and evaluates predictions with MAE across multiple stimulus conditions. The authors provide multiple baselines and representative models (time-series and U-Net-based volumetric forecasting) and report that while models outperform naive baselines, there is substantial room for improvement, especially in leveraging cross-neuron information and integrating structural data. They highlight directions like graph-based and latent-variable models, probabilistic forecasting, and connectome-informed approaches, and release code, data, and interactive visualization tools to catalyze future advances. Overall, ZAPBench aims to accelerate predictive neuroscience by offering a scalable, open platform for brain-wide activity forecasting and method development.

Abstract

Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we introduce the Zebrafish Activity Prediction Benchmark (ZAPBench) to measure progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of over 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into forecasting methods.

ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish

TL;DR

ZAPBench establishes a rigorous benchmark for predicting whole-brain neural activity at cellular resolution in a vertebrate, using a 4D light-sheet zebrafish dataset and an evolving connectome. It formalizes horizon-based forecasting with steps and context lengths or , supporting both time-series traces and volumetric video inputs, and evaluates predictions with MAE across multiple stimulus conditions. The authors provide multiple baselines and representative models (time-series and U-Net-based volumetric forecasting) and report that while models outperform naive baselines, there is substantial room for improvement, especially in leveraging cross-neuron information and integrating structural data. They highlight directions like graph-based and latent-variable models, probabilistic forecasting, and connectome-informed approaches, and release code, data, and interactive visualization tools to catalyze future advances. Overall, ZAPBench aims to accelerate predictive neuroscience by offering a scalable, open platform for brain-wide activity forecasting and method development.

Abstract

Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we introduce the Zebrafish Activity Prediction Benchmark (ZAPBench) to measure progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of over 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into forecasting methods.

Paper Structure

This paper contains 39 sections, 12 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: Dataset and Benchmark. A. Whole-brain activity of a larval zebrafish at cellular resolution was recorded with a light-sheet microscopy setup, while the fish experienced a range of visual stimuli vladimirov2014lightsheet. In addition to the light-sheet (LS) 4d-dataset, a synapse-resolution electron microscopy (EM) 3d-dataset was acquired from the same animal. B. We propose a novel forecasting benchmark in which neural activity is predicted from past activity, using both time series and volumetric video models. Predicted activity (PA) is compared to ground truth (GT), and performance is scored by computing the mean absolute error (MAE) between both.
  • Figure 2: Lightsheet data and postprocessing. A. Frames of the raw calcium activity at different z-depths and time points. Brightness encodes fluorescence, i.e., activity. Last two panels compare beginning and end of experiment at the same depth, which is misaligned at cellular resolution. B. Flow fields estimated to correct for deformations in the volume. Color encodes magnitude of flow field in y-direction, with more saturated colors indicating larger magnitude. There is more deformation at the end of the session relative to the beginning. C. Segmentation at different depths, which we used to extract activity traces from the aligned volume.
  • Figure 3: Activity traces. A. Time series for 71,721 neurons extracted from whole-brain calcium recording lasting two hours. Color represents normalized activity ($\Delta F/F$) with brighter colors indicating higher activity. White lines denote changes of stimulus condition, the short name of which is on top. Neurons are ordered by similarity, using rastermap stringer2023rastermap. Note that this representation squeezes the neuron dimension relative to the time dimension. The original aspect ratio of neurons to timesteps is approximately 9:1, whereas in the figure it is 1:2 for presentation purposes. B. Per-condition training/validation/test set splits.
  • Figure 4: Grand average results for short and long context. To compare overall performance, we take the grand average MAE (lower is better) across conditions for short ($C=4$) and long context ($C=256$). Error bars indicate variability due to random number generator seeding, excluding variability across conditions (95%-confidence intervals; 3 random seeds). Values are clipped to axis limits. The dotted black line indicates performance of the mean baseline, the solid line is the stimulus baseline. Per-condition results are reported in the supplement.
  • Figure 5: Hold-out condition results for short and long context. Performance on taxis, held-out from training, measured by MAE (lower is better). Points are individual runs (3 random seeds per model). Values shown are clipped to axis limits. The dotted black line indicates performance of the mean baseline. Note that the poor performance of TiDE can be explained by its reliance on stimulus covariates, which are out-of-distribution for this condition.
  • ...and 12 more figures