CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models

Fabian Hörst; Moritz Rempe; Helmut Becker; Lukas Heine; Julius Keyl; Jens Kleesiek

CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models

Fabian Hörst, Moritz Rempe, Helmut Becker, Lukas Heine, Julius Keyl, Jens Kleesiek

TL;DR

CellViT++ provides a robust and efficient open-source framework that addresses key limitations in computational pathology by decoupling segmentation from classification and its ability to adapt to new cell types with minimal data and its support for automated dataset generation from IF slides significantly reduce the reliance on time-consuming expert annotation.

Abstract

Digital Pathology is a cornerstone in the diagnosis and treatment of diseases. A key task in this field is the identification and segmentation of cells in hematoxylin and eosin-stained images. Existing methods for cell segmentation often require extensive annotated datasets for training and are limited to a predefined cell classification scheme. To overcome these limitations, we propose $\text{CellViT}^{\scriptscriptstyle ++}$, a framework for generalized cell segmentation in digital pathology. $\text{CellViT}^{\scriptscriptstyle ++}$ utilizes Vision Transformers with foundation models as encoders to compute deep cell features and segmentation masks simultaneously. To adapt to unseen cell types, we rely on a computationally efficient approach. It requires minimal data for training and leads to a drastically reduced carbon footprint. We demonstrate excellent performance on seven different datasets, covering a broad spectrum of cell types, organs, and clinical settings. The framework achieves remarkable zero-shot segmentation and data-efficient cell-type classification. Furthermore, we show that $\text{CellViT}^{\scriptscriptstyle ++}$ can leverage immunofluorescence stainings to generate training datasets without the need for pathologist annotations. The automated dataset generation approach surpasses the performance of networks trained on manually labeled data, demonstrating its effectiveness in creating high-quality training datasets without expert annotations. To advance digital pathology, $\text{CellViT}^{\scriptscriptstyle ++}$ is available as an open-source framework featuring a user-friendly, web-based interface for visualization and annotation. The code is available under https://github.com/TIO-IKIM/CellViT-plus-plus.

CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models

TL;DR

Abstract

, a framework for generalized cell segmentation in digital pathology.

utilizes Vision Transformers with foundation models as encoders to compute deep cell features and segmentation masks simultaneously. To adapt to unseen cell types, we rely on a computationally efficient approach. It requires minimal data for training and leads to a drastically reduced carbon footprint. We demonstrate excellent performance on seven different datasets, covering a broad spectrum of cell types, organs, and clinical settings. The framework achieves remarkable zero-shot segmentation and data-efficient cell-type classification. Furthermore, we show that

can leverage immunofluorescence stainings to generate training datasets without the need for pathologist annotations. The automated dataset generation approach surpasses the performance of networks trained on manually labeled data, demonstrating its effectiveness in creating high-quality training datasets without expert annotations. To advance digital pathology,

is available as an open-source framework featuring a user-friendly, web-based interface for visualization and annotation. The code is available under https://github.com/TIO-IKIM/CellViT-plus-plus.

Paper Structure (43 sections, 2 equations, 8 figures, 23 tables)

This paper contains 43 sections, 2 equations, 8 figures, 23 tables.

Results
Discussion
CRediT authorship contribution statement
Methods

Figures (8)

Figure 1: Overview of the $\text{CellViT}^{{ ++}}$ Framework. a Network architecture including the newly introduced cell classification module based on cell embeddings which are equal to the Transformer tokens of the last Transformer Block. Cell embedding vectors can be extracted in the forward pass in conjunction with the segmentation process. The embeddings are subsequently used to train a cell type classification module, to adapt the framework to new cell classes. The segmentation network of $\text{CellViT}^{{ ++}}$ is pretrained using the PanNuke dataset (1). Tissue types highlighted in bold are selected for further analysis in this study. Subsequent classification modules are trained on unseen datasets (2) and the results are combined with the segmentation masks. Cohort analysis and image visualization can be performed with our web-based viewer (3). b Pipeline to automatically derive labels from registered H&E and IF scans using $\text{CellViT}^{{ ++}}$, exemplified by the SegPath dataset.
Figure 2: Ocelot results and comparison with state-of-the-art baseline network SoftCTM. a Mean $F_1$-Score averaged over all tissue types in the dataset on the official test set for multiple image encoders. Each cell classification module and the baseline SoftCTM model were trained on a limited amount of training data. Results are given for 5 experiments with different seeds. b Organ-wise detection results of the baseline SoftCTM model in comparison to the best performing $\text{CellViT}^{{ ++}}$ model, again trained on limited dataset sizes.
Figure 3: Experimental evaluation on colon tissue cell datasets. a Performance comparison of the $\text{CellViT}^{{ ++}}_\text{SAM-H}$ network, with and without data augmentation, across varying amounts of training slides, besides baseline results from HoVer-Net and PointNu-Net (SOTA) on the CoNSeP dataset. The upper panel presents the mean panoptic quality (mPQ), while the lower panel depicts the number of nuclei in the datasets. Training data is incrementally increased from a single crop of 1 slide to 15 crops across 15 slides. b Nuclei specific performance comparison on CoNSeP. c Average mPQ on the Lizard dataset compared to top-performing networks. Additionally, $\text{CellViT}^{{ ++}}_\text{SAM-H}$ is evaluated using both ViT token embeddings as cell features and classical nuclei features with deep learning and CatBoost classifiers. The HoVer-Net Cerberus is a re-trained version by cerberus. d Runtime and energy efficiency comparison of our network, trained on the CoNSeP and Lizard datasets, against HoVer-Net.
Figure 4: Experimental evaluation on breast cancer tissue datasets, including training on automatically derived lymphocytes and plasma cells from the SegPath dataset. a Comparison of $\text{F}_1$-score, precision, and recall for different $\text{CellViT}^{{ ++}}$ models on the NuCLS dataset with all cell types included. b$\text{CellViT}^{{ ++}}$ performance on the PanopTILs dataset for analyzing the tumor microenvironment in breast cancer. c Detection performance comparison of our network on the NuCLS test set, trained with automatically derived cells from the SegPath dataset versus fully supervised training on the NuCLS training dataset for lymphocytes and plasma cells. The lower panel shows the number of training cells in both datasets. For SegPath, $\text{CellViT}^{{ ++}}$ was applied to HE-slides, with the resulting cell contours mapped to the IHC mask to derive cell classes.
Figure 5: Sugested workflow for minimal human intervention training.
...and 3 more figures

CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models

TL;DR

Abstract

CellViT++: Energy-Efficient and Adaptive Cell Segmentation and Classification Using Foundation Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)