Assessment of Cell Nuclei AI Foundation Models in Kidney Pathology
Junlin Guo, Siqi Lu, Can Cui, Ruining Deng, Tianyuan Yao, Zhewen Tao, Yizhe Lin, Marilyn Lionts, Quan Liu, Juming Xiong, Yu Wang, Shilin Zhao, Catie Chang, Mitchell Wilkes, Mengmeng Yin, Haichun Yang, Yuankai Huo
TL;DR
The study addresses the generalization of cell nuclei foundation models to kidney pathology by conducting a large-scale evaluation of three SOTA models (Cellpose, StarDist, CellViT) on 2,542 kidney WSIs. It employs a rating-based curation scheme and cross-model agreement analysis to examine prediction distributions and identify consensus failure patches, revealing that CellViT achieves the highest rate of good segmentations but substantial gaps remain. The findings highlight specific failure modes and demonstrate how ensemble and consensus approaches can guide the development of kidney-domain-specific foundation models with reduced annotation needs. Overall, the work provides a rigorous benchmark and practical insights for improving nuclei segmentation in diverse kidney tissues.
Abstract
Cell nuclei instance segmentation is a crucial task in digital kidney pathology. Traditional automatic segmentation methods often lack generalizability when applied to unseen datasets. Recently, the success of foundation models (FMs) has provided a more generalizable solution, potentially enabling the segmentation of any cell type. In this study, we perform a large-scale evaluation of three widely used state-of-the-art (SOTA) cell nuclei foundation models (Cellpose, StarDist, and CellViT). Specifically, we created a highly diverse evaluation dataset consisting of 2,542 kidney whole slide images (WSIs) collected from both human and rodent sources, encompassing various tissue types, sizes, and staining methods. To our knowledge, this is the largest-scale evaluation of its kind to date. Our quantitative analysis of the prediction distribution reveals a persistent performance gap in kidney pathology. Among the evaluated models, CellViT demonstrated superior performance in segmenting nuclei in kidney pathology. However, none of the foundation models are perfect; a performance gap remains in general nuclei segmentation for kidney pathology.
