Table of Contents
Fetching ...

PD-L1 Classification of Weakly-Labeled Whole Slide Images of Breast Cancer

Giacomo Cignoni, Cristian Scatena, Chiara Frascarelli, Nicola Fusco, Antonio Giuseppe Naccarato, Giuseppe Nicoló Fanelli, Alina Sîrbu

TL;DR

This work tackles the challenge of quantifying PD-L1 positivity in breast cancer whole-slide images using weakly labeled data. It presents a two-phase approach (ROI identification followed by WSIs-level PD-L1 classification) with two representation families: colour-distance histograms and convolutional autoencoder embeddings, each paired with ML classifiers. Across two clinical datasets and two training configurations (single-dataset vs combined) and with/without artifact removal, colour-distance histograms excel when artifacts are removed and data are diverse, while autoencoder embeddings offer greater robustness to domain shift. The findings support the viability of weakly supervised WSI analysis for PD-L1 scoring and point to a modular framework that can adapt to future improvements and clinical workflows.

Abstract

Specific and effective breast cancer therapy relies on the accurate quantification of PD-L1 positivity in tumors, which appears in the form of brown stainings in high resolution whole slide images (WSIs). However, the retrieval and extensive labeling of PD-L1 stained WSIs is a time-consuming and challenging task for pathologists, resulting in low reproducibility, especially for borderline images. This study aims to develop and compare models able to classify PD-L1 positivity of breast cancer samples based on WSI analysis, relying only on WSI-level labels. The task consists of two phases: identifying regions of interest (ROI) and classifying tumors as PD-L1 positive or negative. For the latter, two model categories were developed, with different feature extraction methodologies. The first encodes images based on the colour distance from a base color. The second uses a convolutional autoencoder to obtain embeddings of WSI tiles, and aggregates them into a WSI-level embedding. For both model types, features are fed into downstream ML classifiers. Two datasets from different clinical centers were used in two different training configurations: (1) training on one dataset and testing on the other; (2) combining the datasets. We also tested the performance with or without human preprocessing to remove brown artefacts Colour distance based models achieve the best performances on testing configuration (1) with artefact removal, while autoencoder-based models are superior in the remaining cases, which are prone to greater data variability.

PD-L1 Classification of Weakly-Labeled Whole Slide Images of Breast Cancer

TL;DR

This work tackles the challenge of quantifying PD-L1 positivity in breast cancer whole-slide images using weakly labeled data. It presents a two-phase approach (ROI identification followed by WSIs-level PD-L1 classification) with two representation families: colour-distance histograms and convolutional autoencoder embeddings, each paired with ML classifiers. Across two clinical datasets and two training configurations (single-dataset vs combined) and with/without artifact removal, colour-distance histograms excel when artifacts are removed and data are diverse, while autoencoder embeddings offer greater robustness to domain shift. The findings support the viability of weakly supervised WSI analysis for PD-L1 scoring and point to a modular framework that can adapt to future improvements and clinical workflows.

Abstract

Specific and effective breast cancer therapy relies on the accurate quantification of PD-L1 positivity in tumors, which appears in the form of brown stainings in high resolution whole slide images (WSIs). However, the retrieval and extensive labeling of PD-L1 stained WSIs is a time-consuming and challenging task for pathologists, resulting in low reproducibility, especially for borderline images. This study aims to develop and compare models able to classify PD-L1 positivity of breast cancer samples based on WSI analysis, relying only on WSI-level labels. The task consists of two phases: identifying regions of interest (ROI) and classifying tumors as PD-L1 positive or negative. For the latter, two model categories were developed, with different feature extraction methodologies. The first encodes images based on the colour distance from a base color. The second uses a convolutional autoencoder to obtain embeddings of WSI tiles, and aggregates them into a WSI-level embedding. For both model types, features are fed into downstream ML classifiers. Two datasets from different clinical centers were used in two different training configurations: (1) training on one dataset and testing on the other; (2) combining the datasets. We also tested the performance with or without human preprocessing to remove brown artefacts Colour distance based models achieve the best performances on testing configuration (1) with artefact removal, while autoencoder-based models are superior in the remaining cases, which are prone to greater data variability.
Paper Structure (15 sections, 6 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Examples of WSIs in the datasets. Top row WSIs are from the first dataset, bottom row from the second dataset (external test set). Artifacts are easily noticeable in some of these examples.
  • Figure 2: Pipeline: ROI identification, WSI representation and final classification.
  • Figure 3: Histograms with 100 bins in logarithmic scale of a positive and negative WSI. We observe that for the positive slide the fraction of pixels in the lowest part of the distribution is higher.
  • Figure 4: The figure represents the CAE structure. The encoder uses convolutions with max pooling, while the decoder uses padding. Batch normalization and small convolutional and deconvolutional kernels (3x3, 5x5) are used. The final embedding (purple) is extracted after the 2 fully connected encoder layers.
  • Figure 5: WSI with 2 dark artifacts and confrontation with resulting ROI identification processes with artifacts reduction
  • ...and 1 more figures