Table of Contents
Fetching ...

Forecasting Whole-Brain Neuronal Activity from Volumetric Video

Alexander Immer, Jan-Matthis Lueckmann, Alex Bo-Yuan Chen, Peter H. Li, Mariela D. Petkova, Nirmala A. Iyer, Aparna Dev, Gudrun Ihrke, Woohyun Park, Alyson Petruncio, Aubrey Weigel, Wyatt Korff, Florian Engert, Jeff W. Lichtman, Misha B. Ahrens, Viren Jain, Michał Januszewski

TL;DR

This work tackles forecasting whole-brain neuronal activity directly from volumetric video, addressing information loss from ROI-based trace extraction. It introduces a scalable 4D UNet that treats temporal context as input channels and uses lead-time conditioning to predict the next $H$ frames, evaluated with voxel-wise MAE on ZAPBench data. Key findings show that the volumetric video approach outperforms trace-based methods for short horizons by harnessing spatial correlations, with a notable trade-off between spatial context and temporal context and minimal gains from cross-specimen pre-training. While the method incurs substantially higher compute costs, it preserves spatial structure and demonstrates potential for improved first-step forecasts, suggesting future directions in probabilistic and latent representations to further exploit volumetric brain data.$C$ and $H$ frame horizons are central to the framing of forecasts and evaluation.

Abstract

Large-scale neuronal activity recordings with fluorescent calcium indicators are increasingly common, yielding high-resolution 2D or 3D videos. Traditional analysis pipelines reduce this data to 1D traces by segmenting regions of interest, leading to inevitable information loss. Inspired by the success of deep learning on minimally processed data in other domains, we investigate the potential of forecasting neuronal activity directly from volumetric videos. To capture long-range dependencies in high-resolution volumetric whole-brain recordings, we design a model with large receptive fields, which allow it to integrate information from distant regions within the brain. We explore the effects of pre-training and perform extensive model selection, analyzing spatio-temporal trade-offs for generating accurate forecasts. Our model outperforms trace-based forecasting approaches on ZAPBench, a recently proposed benchmark on whole-brain activity prediction in zebrafish, demonstrating the advantages of preserving the spatial structure of neuronal activity.

Forecasting Whole-Brain Neuronal Activity from Volumetric Video

TL;DR

This work tackles forecasting whole-brain neuronal activity directly from volumetric video, addressing information loss from ROI-based trace extraction. It introduces a scalable 4D UNet that treats temporal context as input channels and uses lead-time conditioning to predict the next frames, evaluated with voxel-wise MAE on ZAPBench data. Key findings show that the volumetric video approach outperforms trace-based methods for short horizons by harnessing spatial correlations, with a notable trade-off between spatial context and temporal context and minimal gains from cross-specimen pre-training. While the method incurs substantially higher compute costs, it preserves spatial structure and demonstrates potential for improved first-step forecasts, suggesting future directions in probabilistic and latent representations to further exploit volumetric brain data. and frame horizons are central to the framing of forecasts and evaluation.

Abstract

Large-scale neuronal activity recordings with fluorescent calcium indicators are increasingly common, yielding high-resolution 2D or 3D videos. Traditional analysis pipelines reduce this data to 1D traces by segmenting regions of interest, leading to inevitable information loss. Inspired by the success of deep learning on minimally processed data in other domains, we investigate the potential of forecasting neuronal activity directly from volumetric videos. To capture long-range dependencies in high-resolution volumetric whole-brain recordings, we design a model with large receptive fields, which allow it to integrate information from distant regions within the brain. We explore the effects of pre-training and perform extensive model selection, analyzing spatio-temporal trade-offs for generating accurate forecasts. Our model outperforms trace-based forecasting approaches on ZAPBench, a recently proposed benchmark on whole-brain activity prediction in zebrafish, demonstrating the advantages of preserving the spatial structure of neuronal activity.

Paper Structure

This paper contains 24 sections, 4 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: We propose to model light-sheet microscopy recordings of neural activity directly as volumetric video for forecasting instead of extracting and modelling neuron traces. Specifically, we train a model directly on the video and mask the output to optimize the per-neuron mean absolute error (MAE). We find that a UNet performs particularly well for small temporal context and can more effectively utilize spatial contextual information than trace-based time series models.
  • Figure 2: Illustration of potential loss of information when segmenting neurons. The colored objects are predicted segmentation masks. A fragment of a 2d slice of the activity video is shown in greyscale.
  • Figure 3: Architecture and input sharding overview. A: We use a variation of the UNet architecture ronneberger2015u with 3D spatial input and treat the $C$ input frames as channels. Further, we use a fixed number of features at every resolution to improve scalability. The network is conditioned on the time horizon $H$ and outputs a single volumetric frame at a time, similar to MetNet-3 andrychowicz2023deep. To control for spatial context at constant FLOPS, four blocks at the lowest resolution can be replaced by one block of higher resolution. B: Data loading and the network are spatially sharded and allow for flexible scaling to full resolution inputs.
  • Figure 4: Comparison of direct MAE and lead-time conditioned variants.
  • Figure 5: Validation and test performance for varying temporal context sizes $C$ as well as spatial context sizes $S$ with networks having comparable FLOPS. We find that there is a trade-off between spatial and temporal context with a cross-over point between $C=16$ and $C=64$, where spatial context stops being useful and leads to overfitting. The periodicity of many conditions is roughly $64$, which might explain spatial context becoming redundant. We report the mean and two standard errors.
  • ...and 8 more figures