Table of Contents
Fetching ...

PySlyde: A Lightweight, Open-Source Toolkit for Pathology Preprocessing

Gregory Verghese, Anthony Baptista, Chima Eke, Holly Rafique, Mengyuan Li, Fathima Mohamed, Ananya Bhalla, Lucy Ryan, Michael Pitcher, Enrico Parisini, Concetta Piazzese, Liz Ing-Simmons, Anita Grigoriadis

TL;DR

PySlyde, a lightweight, open-source Python toolkit built on OpenSlide to simplify and standardise WSI preprocessing, streamlines WSI preprocessing, enhances reproducibility, and accelerates the generation of AI-ready datasets, enabling researchers to focus on model development and downstream analysis.

Abstract

The integration of artificial intelligence (AI) into pathology is advancing precision medicine by improving diagnosis, treatment planning, and patient outcomes. Digitised whole-slide images (WSIs) capture rich spatial and morphological information vital for understanding disease biology, yet their gigapixel scale and variability pose major challenges for standardisation and analysis. Robust preprocessing, covering tissue detection, tessellation, stain normalisation, and annotation parsing is critical but often limited by fragmented and inconsistent workflows. We present PySlyde, a lightweight, open-source Python toolkit built on OpenSlide to simplify and standardise WSI preprocessing. PySlyde provides an intuitive API for slide loading, annotation management, tissue detection, tiling, and feature extraction, compatible with modern pathology foundation models. By unifying these processes, it streamlines WSI preprocessing, enhances reproducibility, and accelerates the generation of AI-ready datasets, enabling researchers to focus on model development and downstream analysis.

PySlyde: A Lightweight, Open-Source Toolkit for Pathology Preprocessing

TL;DR

PySlyde, a lightweight, open-source Python toolkit built on OpenSlide to simplify and standardise WSI preprocessing, streamlines WSI preprocessing, enhances reproducibility, and accelerates the generation of AI-ready datasets, enabling researchers to focus on model development and downstream analysis.

Abstract

The integration of artificial intelligence (AI) into pathology is advancing precision medicine by improving diagnosis, treatment planning, and patient outcomes. Digitised whole-slide images (WSIs) capture rich spatial and morphological information vital for understanding disease biology, yet their gigapixel scale and variability pose major challenges for standardisation and analysis. Robust preprocessing, covering tissue detection, tessellation, stain normalisation, and annotation parsing is critical but often limited by fragmented and inconsistent workflows. We present PySlyde, a lightweight, open-source Python toolkit built on OpenSlide to simplify and standardise WSI preprocessing. PySlyde provides an intuitive API for slide loading, annotation management, tissue detection, tiling, and feature extraction, compatible with modern pathology foundation models. By unifying these processes, it streamlines WSI preprocessing, enhances reproducibility, and accelerates the generation of AI-ready datasets, enabling researchers to focus on model development and downstream analysis.

Paper Structure

This paper contains 5 sections, 1 figure.

Figures (1)

  • Figure 1: Schematic overview of the WSI processing workflow supported by PySlyde. The overview is demonstrating a sample of functionality across the main classes and modules Slide, WSIParser, FeatureGenerator and the IO. Top row, from left to right: WSI loading using compatible formats through OpenSlide; annotation mask generation with support for QuPath or ImageJ formats; and preprocessing steps such as tissue detection. Bottom row, from left to right: Tessellation of WSIs into tissue tiles using the WSIParser class; feature extraction from tiles with support for pretrained encoders such as Virchow2, H-optimus, Gigapath, or UNI; and structured storage of tile embeddings and metadata using RocksDB, NumPy, or LMDB.