Table of Contents
Fetching ...

Domain-Specific Foundation Model Improves AI-Based Analysis of Neuropathology

Ruchika Verma, Shrishtee Kandoi, Robina Afzal, Shengjia Chen, Jannes Jegminat, Michael W. Karlovich, Melissa Umphlett, Timothy E. Richardson, Kevin Clare, Quazi Hossain, Jorge Samanamud, Phyllis L. Faust, Elan D. Louis, Ann C. McKee, Thor D. Stein, Jonathan D. Cherry, Jesse Mez, Anya C. McGoldrick, Dalilah D. Quintana Mora, Melissa J. Nirenberg, Ruth H. Walker, Yolfrankcis Mendez, Susan Morgello, Dennis W. Dickson, Melissa E. Murray, Carlos Cordon-Cardo, Nadejda M. Tsankova, Jamie M. Walker, Diana K. Dangoor, Stephanie McQuillan, Emma L. Thorn, Claudia De Sanctis, Shuying Li, Thomas J. Fuchs, Kurt Farrell, John F. Crary, Gabriele Campanella

TL;DR

Neuropathology differs markedly from general surgical pathology, prompting the development of NeuroFM, a domain-specific foundation model pretrained on ~1B brain tissue tiles to capture neurodegenerative patterns. Trained with DINOv2 on ViT-L, NeuroFM demonstrates superior performance versus public pathology FMs across 60 downstream brain tasks, including Braak staging, ADNC metrics, mixed dementia, vascular pathology, ataxia, and hippocampal segmentation, often with smaller models than competitors. Comprehensive ablations reveal that using ~80% neuropathology with ~20% general pathology data yields the best representations, and cross-stain transfer from H&E to IHC tasks is feasible, highlighting practical robustness. The study argues for targeted domain-specific pretraining in digital pathology, showing meaningful gains in clinically critical neuropathology endpoints and offering a blueprint for specialized foundation models in other organ systems.

Abstract

Foundation models have transformed computational pathology by providing generalizable representations from large-scale histology datasets. However, existing models are predominantly trained on surgical pathology data, which is enriched for non-nervous tissue and overrepresents neoplastic, inflammatory, metabolic, and other non-neurological diseases. Neuropathology represents a markedly different domain of histopathology, characterized by unique cell types (neurons, glia, etc.), distinct cytoarchitecture, and disease-specific pathological features including neurofibrillary tangles, amyloid plaques, Lewy bodies, and pattern-specific neurodegeneration. This domain mismatch may limit the ability of general-purpose foundation models to capture the morphological patterns critical for interpreting neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, and cerebellar ataxias. To address this gap, we developed NeuroFM, a foundation model trained specifically on whole-slide images of brain tissue spanning diverse neurodegenerative pathologies. NeuroFM demonstrates superior performance compared to general-purpose models across multiple neuropathology-specific downstream tasks, including mixed dementia disease classification, hippocampal region segmentation, and neurodegenerative ataxia identification encompassing cerebellar essential tremor and spinocerebellar ataxia subtypes. This work establishes that domain-specialized foundation models trained on brain tissue can better capture neuropathology-specific features than models trained on general surgical pathology datasets. By tailoring foundation models to the unique morphological landscape of neurodegenerative diseases, NeuroFM enables more accurate and reliable AI-based analysis for brain disease diagnosis and research, setting a precedent for domain-specific model development in specialized areas of digital pathology.

Domain-Specific Foundation Model Improves AI-Based Analysis of Neuropathology

TL;DR

Neuropathology differs markedly from general surgical pathology, prompting the development of NeuroFM, a domain-specific foundation model pretrained on ~1B brain tissue tiles to capture neurodegenerative patterns. Trained with DINOv2 on ViT-L, NeuroFM demonstrates superior performance versus public pathology FMs across 60 downstream brain tasks, including Braak staging, ADNC metrics, mixed dementia, vascular pathology, ataxia, and hippocampal segmentation, often with smaller models than competitors. Comprehensive ablations reveal that using ~80% neuropathology with ~20% general pathology data yields the best representations, and cross-stain transfer from H&E to IHC tasks is feasible, highlighting practical robustness. The study argues for targeted domain-specific pretraining in digital pathology, showing meaningful gains in clinically critical neuropathology endpoints and offering a blueprint for specialized foundation models in other organ systems.

Abstract

Foundation models have transformed computational pathology by providing generalizable representations from large-scale histology datasets. However, existing models are predominantly trained on surgical pathology data, which is enriched for non-nervous tissue and overrepresents neoplastic, inflammatory, metabolic, and other non-neurological diseases. Neuropathology represents a markedly different domain of histopathology, characterized by unique cell types (neurons, glia, etc.), distinct cytoarchitecture, and disease-specific pathological features including neurofibrillary tangles, amyloid plaques, Lewy bodies, and pattern-specific neurodegeneration. This domain mismatch may limit the ability of general-purpose foundation models to capture the morphological patterns critical for interpreting neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, and cerebellar ataxias. To address this gap, we developed NeuroFM, a foundation model trained specifically on whole-slide images of brain tissue spanning diverse neurodegenerative pathologies. NeuroFM demonstrates superior performance compared to general-purpose models across multiple neuropathology-specific downstream tasks, including mixed dementia disease classification, hippocampal region segmentation, and neurodegenerative ataxia identification encompassing cerebellar essential tremor and spinocerebellar ataxia subtypes. This work establishes that domain-specialized foundation models trained on brain tissue can better capture neuropathology-specific features than models trained on general surgical pathology datasets. By tailoring foundation models to the unique morphological landscape of neurodegenerative diseases, NeuroFM enables more accurate and reliable AI-based analysis for brain disease diagnosis and research, setting a precedent for domain-specific model development in specialized areas of digital pathology.

Paper Structure

This paper contains 38 sections, 75 figures, 10 tables.

Figures (75)

  • Figure 1: Overview of the NeuroFM foundation model for computational neuropathology. (a) Whole slide images (WSI) are tiled from diverse anatomical regions, comprising 80% brain tissue and 20% general pathology specimens. (b) Anatomical distribution of neuropathology slides (top, n=3,660 shown; rest of the neuropathology slides map to these regions) and specimen type distribution of general service pile tiles (bottom, 200M tiles total). (c) Distribution of 59 downstream tasks categorized into 11 disease categories, including brain tumors, neurodegenerative diseases, and neuropathological conditions. (d) Geographic distribution of institutions contributing slides across the United States and England, with abbreviations and institutional affiliations listed by region. (e) NeuroFM pretraining pipeline using self-supervised learning on whole slide images (WSI) with tissue detection, tile extraction, and vision transformer architecture with local/global views feeding student/teacher networks. (f) Downstream task applications include slide-level classification or regression for disease categorization and patch-level coarse segmentation for anatomical structure identification.
  • Figure 2: Comprehensive performance evaluation of NeuroFM against state-of-the-art general purpose pathology foundation models. (A) Overall performance distribution across all neuropathology classification tasks shown as boxplots. NeuroFM achieved the highest mean AUC and demonstrated significantly better performance compared to UNI (p$<$0.01), Virchow2 (p$<$0.01), and Virchow (p$<$0.001), as indicated by asterisks. Each boxplot displays the median, interquartile range, and individual task performances as scattered points. (B) Performance breakdown by disease category showing mean AUC across all encoders. NeuroFM demonstrates consistent advantages across multiple categories, particularly in Neurodegeneration Ataxia, Neurodegeneration Mixed Dementia, Alzheimer's Disease Neuropathologic Change, Cerebrovascular Pathology, and Coarse Segmentation tasks. (C) Model scale versus performance analysis showing poor correlation (R²=0.011), demonstrating that larger models do not necessarily achieve better performance. NeuroFM outperforms models with substantially more parameters (Gigapath: 1.1B, UNI2: 681M, Virchow and Virchow2: 632M), highlighting the value of domain-specific pretraining over model scale. (D) Head-to-head comparison across all 59 tasks showing performance relative to the best-performing encoder for each task. Each bar represents the number of tasks where each model achieved: wins (green) - statistically significantly better performance than the best-performing encoder (p$<$0.05); ties (orange) - statistically insignificant differences from the best-performing encoder; or losses (red) - statistically significantly poorer performance than the best-performing encoder for that task. NeuroFM leads with 12 wins and only 22 losses, compared to lower wins and higher loss rates for general-purpose pathology foundation models. (E) Cross-validation distributions for the 12 tasks where NeuroFM achieved statistically significant superiority over the best-performing general-purpose foundation models. Boxplots show performance distributions across 20 Monte Carlo cross-validation splits with median values annotated. Asterisks denote statistical significance between NeuroFM and the best competitor: * (p$<$0.05), ** (p$<$0.01), *** (p$<$0.001).
  • Figure 3: Comprehensive ablation studies of NeuroFM. (A) Data composition ablation: Training on 80% neuropathology with 20% general pathology H&E data (NeuroFM) achieves significantly superior performance (mean AUROC=0.720) compared to models trained exclusively on neuropathology H&E (NP_HE, 0.708), combined H&E and IHC (NP_Multistain, 0.701), or IHC-only (NP_IHC, 0.697) across 55 classification tasks. Boxplots show performance distributions across all tasks with mean values annotated. Asterisks denote statistical significance determined by Wilcoxon signed-rank test between NeuroFM and the other variants: * (p$<$0.05), ** (p$<$0.01), *** (p$<$0.001). (B) Architecture ablation: ViT-Large (NeuroFM, 0.720) achieves significantly better performance than ViT-Giant (NP_HE_G, 0.715) when trained on the same H&E dataset, demonstrating that the larger architecture does not improve performance. (C) Performance by disease category demonstrates NeuroFM's (purple bars) consistent performance advantages across multiple neurodegenerative disorder categories including Neurodegeneration Ataxia, Neurodegeneration Mixed Dementia, Neuroinfection HIV, Brain Atrophy, and Cerebrovascular Pathology compared to other model variants. (D) Cross-stain generalization on 34 IHC-specific tasks shows H&E-trained NeuroFM (0.612) matches and slightly exceeds the IHC-specialized models (NP_IHC: 0.609; NP_Multistain: 0.608), demonstrating that learned representations transfer across staining modalities without explicit IHC training.
  • Figure 4: Comprehensive evaluation of NeuroFM against general purpose Foundation Models across stain and region specific neuropathology tasks on NACC cohort. (A) Distribution of slides across four brain regions and six stains. (B) Box plots showing the distribution of performance differences (AUROC) between NeuroFM and competing models across all classification tasks and asterisks indicate statistical significance levels. (C) Bar chart comparing mean AUC performance of six foundation models across four disease categories. (D) Stacked bar chart showing task-level win/tie/loss distribution for each FM to the best model per task across 65 tasks, categorized as wins (green - statistically significantly better), ties (orange-no significant difference), and losses (red-significantly worse). NeuroFM achieved 9 wins, 36 ties, and 20 losses against the best-performing alternative encoder per task. (E) Box plots displaying the nine tasks where NeuroFM significantly outperformed the best alternative encoder (p$<$0.05) across different stains and brain regions. Each plot shows individual data points, median values, and statistical significance
  • Figure 5: Model performance on development tasks during pretraining. Performance measured by mean area under the curve (AUC) for multiple development classification tasks including various neurological conditions from MHBB and NPBB, and PWG datasets across five model variants: NP_HE_G (ViT-G), NeuroFM (ViT-L), NP_HE (ViT-L), NP_Multistain (ViT-L), and NP_IHC (ViT-L). Individual task performance is shown as thin colored lines, with the mean AUC across all tasks displayed as a bold red line with markers. The shaded region represents $\pm$1 standard deviation from the mean. The star ($\star$) indicates the checkpoint selected based on early stopping criteria (maximum mean development task AUC). Training iterations are shown as percentages (0–100%) on the x-axis.
  • ...and 70 more figures