PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit
Yuechen Yang, Yu Wang, Tianyuan Yao, Ruining Deng, Mengmeng Yin, Shilin Zhao, Haichun Yang, Yuankai Huo
TL;DR
PySpatial tackles the challenge of large Whole Slide Images by eliminating patch-level workflows used in CellProfiler and performing feature extraction directly on computational regions within the WSI. It combines an R-tree spatial index with matrix-based batch computation to preserve spatial context while accelerating calculation of 247 pathomic features across four categories: Size & Shape, Texture, Intensity, and Intensity Distribution. The authors validate PySpatial on two datasets, PEC and KPMP, reporting approximately 10-fold speedups for small, dense objects and about 2-fold speedups for larger, sparse structures, with feature distributions consistent with CellProfiler. They also discuss memory management considerations, offering a matrix size parameter and an alternative object-level API to handle very large objects, underscoring PySpatial's robustness and scalability for large-scale digital pathology.
Abstract
Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples. However, traditional feature extraction pipelines based on tools like CellProfiler often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI. To address these challenges, we present PySpatial, a high-speed pathomics toolkit specifically designed for WSI-level analysis. PySpatial streamlines the conventional pipeline by directly operating on computational regions of interest, reducing redundant processing steps. Utilizing rtree-based spatial indexing and matrix-based computation, PySpatial efficiently maps and processes computational regions, significantly accelerating feature extraction while maintaining high accuracy. Our experiments on two datasets-Perivascular Epithelioid Cell (PEC) and data from the Kidney Precision Medicine Project (KPMP)-demonstrate substantial performance improvements. For smaller and sparse objects in PEC datasets, PySpatial achieves nearly a 10-fold speedup compared to standard CellProfiler pipelines. For larger objects, such as glomeruli and arteries in KPMP datasets, PySpatial achieves a 2-fold speedup. These results highlight PySpatial's potential to handle large-scale WSI analysis with enhanced efficiency and accuracy, paving the way for broader applications in digital pathology.
