Table of Contents
Fetching ...

CytoNet: A Foundation Model for the Human Cerebral Cortex

Christian Schiffer, Zeynep Boztoprak, Jan-Oliver Kropp, Julia Thönnißen, Katia Berr, Hannah Spitzer, Katrin Amunts, Timo Dickscheid

TL;DR

CytoNet introduces a foundation model for human cortical organization trained with SpatialNCE on millions of unlabeled microscopic patches, using anatomical proximity as a natural self-supervised signal. By mapping patches into a shared feature space and aligning them with a common reference (MNI Colin27), CytoNet captures laminar and areal cytoarchitecture that generalizes across brains and scales. The model achieves strong performance in brain area classification, cortical layer segmentation, structural variation prediction, and data-driven parcellation, while offering interpretable insights through attention maps and cross-brain spatial encoding. This approach enables scalable, anatomically grounded brain mapping at terabyte-to-petabyte scales and lays the foundation for multimodal integration and comprehensive analyses of cortical microstructure across individuals.

Abstract

To study how the human brain works, we need to explore the organization of the cerebral cortex and its detailed cellular architecture. We introduce CytoNet, a foundation model that encodes high-resolution microscopic image patches of the cerebral cortex into highly expressive feature representations, enabling comprehensive brain analyses. CytoNet employs self-supervised learning using spatial proximity as a powerful training signal, without requiring manual labelling. The resulting features are anatomically sound and biologically relevant. They encode general aspects of cortical architecture and unique brain-specific traits. We demonstrate top-tier performance in tasks such as cortical area classification, cortical layer segmentation, cell morphology estimation, and unsupervised brain region mapping. As a foundation model, CytoNet offers a consistent framework for studying cortical microarchitecture, supporting analyses of its relationship with other structural and functional brain features, and paving the way for diverse neuroscientific investigations.

CytoNet: A Foundation Model for the Human Cerebral Cortex

TL;DR

CytoNet introduces a foundation model for human cortical organization trained with SpatialNCE on millions of unlabeled microscopic patches, using anatomical proximity as a natural self-supervised signal. By mapping patches into a shared feature space and aligning them with a common reference (MNI Colin27), CytoNet captures laminar and areal cytoarchitecture that generalizes across brains and scales. The model achieves strong performance in brain area classification, cortical layer segmentation, structural variation prediction, and data-driven parcellation, while offering interpretable insights through attention maps and cross-brain spatial encoding. This approach enables scalable, anatomically grounded brain mapping at terabyte-to-petabyte scales and lays the foundation for multimodal integration and comprehensive analyses of cortical microstructure across individuals.

Abstract

To study how the human brain works, we need to explore the organization of the cerebral cortex and its detailed cellular architecture. We introduce CytoNet, a foundation model that encodes high-resolution microscopic image patches of the cerebral cortex into highly expressive feature representations, enabling comprehensive brain analyses. CytoNet employs self-supervised learning using spatial proximity as a powerful training signal, without requiring manual labelling. The resulting features are anatomically sound and biologically relevant. They encode general aspects of cortical architecture and unique brain-specific traits. We demonstrate top-tier performance in tasks such as cortical area classification, cortical layer segmentation, cell morphology estimation, and unsupervised brain region mapping. As a foundation model, CytoNet offers a consistent framework for studying cortical microarchitecture, supporting analyses of its relationship with other structural and functional brain features, and paving the way for diverse neuroscientific investigations.

Paper Structure

This paper contains 26 sections, 2 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Illustration of the self-supervised pretraining workflow using the proposed SpatialNCE loss in CytoNet. Spatial transformations between the MNI Colin 27 Holmes1998 3D reference coordinate space (A) and microscopic scans of histological brain sections (C) of postmortem human brains (B) were used to link high-resolution microscopic image patches (E) with corresponding 3D locations in the common reference space, allowing to estimate distances between sampled image patches from different brains. These were used to compute similarity scores for the proposed SpatialNCE contrastive loss (D), which promotes extraction of expressive feature vectors for each image patch.
  • Figure 2: Anatomical plausibility of feature representations learned by CytoNet-ViT (1M).Top: 2D UMAP plot of the learned latent space, color coded by maximum probability labels of corresponding coordinates in the Julich Brain atlas (version 3.1, Amunts2020), as an approximate assignment to brain areas. Brain-specific clusters fan out along the first UMAP dimension, while the second UMAP dimension shows a transition from the occipital to the frontal pole. A gap along the anterior-posterior axis co-aligns with the central sulcus, marking a prominent structural and functional division. The cluster corresponding to B9.0 ---not included during pretraining--- appears more compact than the other clusters, but shows a comparable cytoarchitectonic organization. Bottom: Aggregated pairwise cosine similarity of features across ten brains. Cosine similarity was computed between feature vectors from image patches, grouped by Julich Brain Atlas labels and averaged over all area pairs. Rows and columns represent brain areas, ordered by hemisphere, lobe, and label; area names are omitted for clarity (see supplementary \ref{['tab:julich_brain_areas']}).
  • Figure 3: Attention maps from the first self-attention layer of CytoNet-ViT (1M). The figure includes example patches from areas hOc1 (primary visual cortex, Amunts2000), 4a (primary motor cortex, Geyer1996), and 3b (primary somatosensory cortex, Geyer1999). Each row shows the input image (left) and attention scores of all 12 heads overlaid on the image (red = stronger attention). Highlighted are the stripe of Gennari in the primary visual cortex (top), Betz giant cells in layer V of motor cortex (center), and a pronounced layer IV in somatosensory cortex (bottom). Attention scores were gamma transformed ($\gamma=0.5$) to aid visualization.
  • Figure 4: Comparison of predictive performance between intensity profiles and CytoNet-ViT (1M) features in B20.0.A: Linear regression models using varying subsets of PCA components revealed substantially higher $R^2$ scores for CytoNet features compared to intensity profiles across all evaluated structural and morphological properties. Reported values reflect the average $R^2$ across 5-fold cross-validation. B: Absolute feature importance scores ---derived from regression coefficients for the first 32 PCA components of CytoNet features--- showed that components 1–3 were strongly associated with spatial location in MNI space and the density of cortical layer IV, while other properties were predominantly encoded in higher components. LI to L6 denote cortical layers I to VI. C: The cumulative explained variance across PCA components indicates that CytoNet features capture substantially more variance than intensity profiles.
  • Figure 5: Performance of cytoarchitectonic brain area classification using CytoNet.A: Macro-F1 scores obtained by linear probing of different models. Mean and standard deviation over three training runs are reported. See supplementary \ref{['tab:scores_table']} for more detailed scores. If applicable, the number of pretraining samples are indicated after the model name. B: Distribution of prediction errors of CytoNet-ViT (1M) by error distance for seen, transfer, and unseen brains. Error distance was defined as the number of hops between the predicted and true brain area in the adjacency graph of the Julich Brain Atlas 3.1 Amunts2020, where 1-hop errors correspond to directly adjacent areas, and larger distances reflect increasing topological separation. C: Boxplots of the logit margins for CytoNet-ViT (1M) predictions stratified by error distance. The logit margin ---the difference between the top two logits--- serves as a proxy for model confidence and distance to the decision boundary Ngnawe2024. Correct predictions reveal higher confidence, while incorrect predictions show decreasing confidence with increasing error distance.
  • ...and 5 more figures