Table of Contents
Fetching ...

Adaptive Multi-Scale Integration Unlocks Robust Cell Annotation in Histopathology Images

Yinuo Xu, Yan Cui, Mingyao Li, Zhi Huang

TL;DR

NuClass addresses the gap in per-cell phenotyping in histopathology by combining nucleus-focused morphology with tissue-context signals through a gated, multi-scale framework. It leverages a marker-guided Xenium dataset spanning 8 organs and 16 classes to train two specialized paths—Path local for nuclear features and Path global for contextual cues—and a per-cell gate that fuses their predictions in probability space. The approach yields strong, interpretable performance across three held-out cohorts, achieving up to 96% F1 on its best class and producing calibrated confidence estimates and Grad-CAM explanations. This work demonstrates that selective, uncertainty-aware fusion of multi-scale information can bridge slide-level foundation models and reliable cell-level phenotype prediction, with practical implications for scalable, cross-organ cellular analysis in pathology.

Abstract

Identifying cell types and subtypes in routine histopathology is fundamental for understanding disease. Existing tile-based models capture nuclear detail but miss the broader tissue context that influences cell identity. Current human annotations are coarse-grained and uneven across studies, making fine-grained, subtype-level classification difficult. In this study, we build a marker-guided dataset from Xenium spatial transcriptomics with single-cell resolution labels for more than two million cells across eight organs and 16 classes to address the lack of high-quality annotations. Leveraging this data resource, we introduce NuClass, a pathologist workflow inspired framework for cell-wise multi-scale integration of nuclear morphology and microenvironmental context. It combines Path local, which focuses on nuclear morphology from 224x224 pixel crops, and Path global, which models the surrounding 1024x1024 pixel neighborhood, through a learnable gating module that balances local and global information. An uncertainty-guided objective directs the global path to prioritize regions where the local path is uncertain, and we provide calibrated confidence estimates and Grad-CAM maps for interpretability. Evaluated on three fully held-out cohorts, NuClass achieves up to 96 percent F1 for its best-performing class, outperforming strong baselines. Our results demonstrate that multi-scale, uncertainty-aware fusion can bridge the gap between slide-level pathological foundation models and reliable, cell-level phenotype prediction.

Adaptive Multi-Scale Integration Unlocks Robust Cell Annotation in Histopathology Images

TL;DR

NuClass addresses the gap in per-cell phenotyping in histopathology by combining nucleus-focused morphology with tissue-context signals through a gated, multi-scale framework. It leverages a marker-guided Xenium dataset spanning 8 organs and 16 classes to train two specialized paths—Path local for nuclear features and Path global for contextual cues—and a per-cell gate that fuses their predictions in probability space. The approach yields strong, interpretable performance across three held-out cohorts, achieving up to 96% F1 on its best class and producing calibrated confidence estimates and Grad-CAM explanations. This work demonstrates that selective, uncertainty-aware fusion of multi-scale information can bridge slide-level foundation models and reliable cell-level phenotype prediction, with practical implications for scalable, cross-organ cellular analysis in pathology.

Abstract

Identifying cell types and subtypes in routine histopathology is fundamental for understanding disease. Existing tile-based models capture nuclear detail but miss the broader tissue context that influences cell identity. Current human annotations are coarse-grained and uneven across studies, making fine-grained, subtype-level classification difficult. In this study, we build a marker-guided dataset from Xenium spatial transcriptomics with single-cell resolution labels for more than two million cells across eight organs and 16 classes to address the lack of high-quality annotations. Leveraging this data resource, we introduce NuClass, a pathologist workflow inspired framework for cell-wise multi-scale integration of nuclear morphology and microenvironmental context. It combines Path local, which focuses on nuclear morphology from 224x224 pixel crops, and Path global, which models the surrounding 1024x1024 pixel neighborhood, through a learnable gating module that balances local and global information. An uncertainty-guided objective directs the global path to prioritize regions where the local path is uncertain, and we provide calibrated confidence estimates and Grad-CAM maps for interpretability. Evaluated on three fully held-out cohorts, NuClass achieves up to 96 percent F1 for its best-performing class, outperforming strong baselines. Our results demonstrate that multi-scale, uncertainty-aware fusion can bridge the gap between slide-level pathological foundation models and reliable, cell-level phenotype prediction.

Paper Structure

This paper contains 31 sections, 25 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 1: NuClass: cell-wise multi-scale classification with gating.(Top) From whole-slide images (WSIs), we extract paired fields centered on each cell: a nucleus-scale crop ($224^{2}$ at 0.25 mpp) and a co-registered contextual field-of-view (FOV; default $1024^{2}$ at 0.25 mpp). Spatial assays (Xenium-like) provide gene-marker profiles and cell centroids, enabling scalable, subcellular annotations. (Stage 1, Path local) A UNI2-h morphology backbone processes the $224^{2}$ crop. A tissue-conditioned FiLM adaptor modulates features before a linear head outputs per-cell probabilities. (Stage 2, Path global) A DINOv3 ViT-L/16 encoder ingests the $1024^{2}$ contextual FOV; its representation is concatenated with a Path local morphology vector to predict complementary probabilities. (Stage 3, Fusion Gate) A lightweight cell-wise gate receives statistics from both distributions and compact feature projections, then fuses experts in probability space$\mathbf p_{\text{mix}}=(1-g)\mathbf p_{local}+g\mathbf p_{global}$. The gate learns which expert to trust per cell, not to average logits.
  • Figure 2: Organ and cohort wise composition. Donut plots of the training set (8 organs) and the hold-out testing cohorts (pancreas, ovary, lung). The figure illustrates the distribution shifts between training and evaluation cohorts.
  • Figure 3: Lung subset: reliability and feature geometry. FiLM (left) vs. Concat (right). FiLM reduces ECE guo2017calibration (0.061 vs. 0.213) and produces more compact, tissue-aligned clusters.
  • Figure 4: Within-class shares of local-only (Path local correct, Path global wrong) vs global-only (Path global correct, Path local wrong). The asymmetry indicates complementary failure modes, motivating gated fusion.
  • Figure 5: Token-level Grad-CAM on ovary patches. Columns: input, Path local, LOKI, MUSK, PLIP. Overlays (last block, $\alpha{=}0.45$) on $224^{2}$ patches. Compared with the baselines, Path local concentrates its responses on clusters of nuclei and tissue interfaces, whereas LOKI, MUSK, and PLIP often highlight broad texture patterns or even background regions, indicating weaker alignment with diagnostic structures.
  • ...and 2 more figures