Table of Contents
Fetching ...

Toward Generalizable Whole Brain Representations with High-Resolution Light-Sheet Data

Minyoung E. Kim, Dae Hee Yun, Aditi V. Patel, Madeline Hon, Webster Guan, Taegeon Lee, Brian Nguyen

Abstract

Unprecedented visual details of biological structures are being revealed by subcellular-resolution whole-brain 3D microscopy data, enabled by recent advances in intact tissue processing and light-sheet fluorescence microscopy (LSFM). These volumetric data offer rich morphological and spatial cellular information, however, the lack of scalable data processing and analysis methods tailored to these petabyte-scale data poses a substantial challenge for accurate interpretation. Further, existing models for visual tasks such as object detection and classification struggle to generalize to this type of data. To accelerate the development of suitable methods and foundational models, we present CANVAS, a comprehensive set of high-resolution whole mouse brain LSFM benchmark data, encompassing six neuronal and immune cell-type markers, along with cell annotations and a leaderboard. We also demonstrate challenges in generalization of baseline models built on existing architectures, especially due to the heterogeneity in cellular morphology across phenotypes and anatomical locations in the brain. To the best of our knowledge, CANVAS is the first and largest LSFM benchmark that captures intact mouse brain tissue at subcellular level, and includes extensive annotations of cells throughout the brain.

Toward Generalizable Whole Brain Representations with High-Resolution Light-Sheet Data

Abstract

Unprecedented visual details of biological structures are being revealed by subcellular-resolution whole-brain 3D microscopy data, enabled by recent advances in intact tissue processing and light-sheet fluorescence microscopy (LSFM). These volumetric data offer rich morphological and spatial cellular information, however, the lack of scalable data processing and analysis methods tailored to these petabyte-scale data poses a substantial challenge for accurate interpretation. Further, existing models for visual tasks such as object detection and classification struggle to generalize to this type of data. To accelerate the development of suitable methods and foundational models, we present CANVAS, a comprehensive set of high-resolution whole mouse brain LSFM benchmark data, encompassing six neuronal and immune cell-type markers, along with cell annotations and a leaderboard. We also demonstrate challenges in generalization of baseline models built on existing architectures, especially due to the heterogeneity in cellular morphology across phenotypes and anatomical locations in the brain. To the best of our knowledge, CANVAS is the first and largest LSFM benchmark that captures intact mouse brain tissue at subcellular level, and includes extensive annotations of cells throughout the brain.

Paper Structure

This paper contains 33 sections, 3 equations, 12 figures, 8 tables, 2 algorithms.

Figures (12)

  • Figure 1: CANVAS: A high-resolution light-sheet benchmark dataset, representing 6 different cell types at subcellular resolution.
  • Figure 2: CANVAS Dataset. (a) Overview of datasets in CANVAS showing six cell type markers imaged using light sheet fluorescence microscopy. Markers include NeuN (cyan), IBA1 (magenta), GFAP (yellow), TH (red), cFOS (green), and PV (grey). All markers except PV are based on immunolabeling; PV is transgenically labeled with fluorescent proteins. All datasets represent whole-brain imaging, except cFOS, which is hemisphere-only. Images are 500 µm maximum intensity projections. Scale bar: 2 mm. (b) Zoomed-in views from the CANVAS dataset showing six cell type markers across brain regions, with single-cell insets at bottom right of each panel: NeuN (cyan), IBA1 (magenta), GFAP (yellow), TH (red), cFOS (green), and PV (grey), presented as 80 µm maximum intensity projections. Images include the following brain regions: HIP (hippocampus), VTA (ventral tegmental area), CP (caudate putamen), ZI (zona incerta), CEA (central amygdalar nucleus), RT (reticular nucleus of the thalamus), SS (somatosensory areas), VIS (visual areas), LC (locus coeruleus), MO (somatomotor areas), and mPFC (medial prefrontal cortex); and fiber tracts: cc (corpus callosum) and fr (fasciculus retroflexus). Scale bar: 100 µm.
  • Figure 3: 3D-MAE training results. Top: Convergence (loss, PSNR, SSIM) comparison across all markers plus the all-markers model (dashed lines) using the best configuration (16$\times$32$\times$32/4$\times$8$\times$8, $m=0.15$). Bottom: Effect of crop/patch size. GFAP dataset shows the best performance on (24$\times$48$\times$48/6$\times$12$\times$12), while IBA1 and NeuN datasets show the smallest configuration (blue) outperforms larger alternatives.
  • Figure 4: Example $\beta$-amyloid (yellow) segmentation results for (a) ground truth, (b) $\mu$SAM, and (c) CellPose-SAM. (d) $\mu$SAM segmentation overlaid with GFAP (magenta), nuclear stain (cyan), and detected nuclei cell centers (yellow).
  • Figure 5: NeuN reconstruction with varying mask ratios. Compact neuronal nuclei reconstructed with 16×32×32 crop, 4×8×8 patch. Top to bottom: mask ratio 0.15, 0.55, 0.75. Lower masking preserves more morphological detail.
  • ...and 7 more figures