Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology
Junchao Zhu, Mengmeng Yin, Ruining Deng, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo
TL;DR
This work tackles the challenge of limited annotated human kidney histopathology data for cortex–medulla layer segmentation by leveraging cross-species homologous data from mouse kidneys. It introduces a Cross-species Training Framework that jointly trains on human and mouse PAS-stained images, employing a hybrid loss $L_s$ for independent tasks and a weighted Focal loss based joint loss $L_t$ to handle class imbalance, across CNN and Transformer architectures. Empirical results show consistent improvements in both mIoU and Dice scores for the human cortex and medulla, along with enhanced model generalization when cross-species data are used. The findings demonstrate that cross-species, low-noise data can augment learning under limited clinical samples, with practical implications for kidney pathology analysis and broader cross-species data integration in medical image segmentation, supported by publicly available code $\left(https://github.com/hrlblab/layer_segmentation\right)$.
Abstract
Accurate delineation of the boundaries between the renal cortex and medulla is crucial for subsequent functional structural analysis and disease diagnosis. Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data. However, due to the patient's privacy of medical data and scarce clinical cases, constructing pathological datasets from clinical sources is relatively difficult and expensive. Moreover, using external natural image datasets introduces noise during the domain generalization process. Cross-species homologous data, such as mouse kidney data, which exhibits high structural and feature similarity to human kidneys, has the potential to enhance model performance on human datasets. In this study, we incorporated the collected private Periodic Acid-Schiff (PAS) stained mouse kidney dataset into the human kidney dataset for joint training. The results showed that after introducing cross-species homologous data, the semantic segmentation models based on CNN and Transformer architectures achieved an average increase of 1.77% and 1.24% in mIoU, and 1.76% and 0.89% in Dice score for the human renal cortex and medulla datasets, respectively. This approach is also capable of enhancing the model's generalization ability. This indicates that cross-species homologous data, as a low-noise trainable data source, can help improve model performance under conditions of limited clinical samples. Code is available at https://github.com/hrlblab/layer_segmentation.
