Table of Contents
Fetching ...

Context-aware Skin Cancer Epithelial Cell Classification with Scalable Graph Transformers

Lucas Sancéré, Noémie Moreau, Katarzyna Bozek

TL;DR

By evaluating several node feature configurations, it is found that the most informative representation combined morphological and texture features as well as the cell classes of non-epithelial cells, highlighting the importance of the surrounding cellular context.

Abstract

Whole-slide images (WSIs) from cancer patients contain rich information that can be used for medical diagnosis or to follow treatment progress. To automate their analysis, numerous deep learning methods based on convolutional neural networks and Vision Transformers have been developed and have achieved strong performance in segmentation and classification tasks. However, due to the large size and complex cellular organization of WSIs, these models rely on patch-based representations, losing vital tissue-level context. We propose using scalable Graph Transformers on a full-WSI cell graph for classification. We evaluate this methodology on a challenging task: the classification of healthy versus tumor epithelial cells in cutaneous squamous cell carcinoma (cSCC), where both cell types exhibit very similar morphologies and are therefore difficult to differentiate for image-based approaches. We first compared image-based and graph-based methods on a single WSI. Graph Transformer models SGFormer and DIFFormer achieved balanced accuracies of $85.2 \pm 1.5$ ($\pm$ standard error) and $85.1 \pm 2.5$ in 3-fold cross-validation, respectively, whereas the best image-based method reached $81.2 \pm 3.0$. By evaluating several node feature configurations, we found that the most informative representation combined morphological and texture features as well as the cell classes of non-epithelial cells, highlighting the importance of the surrounding cellular context. We then extended our work to train on several WSIs from several patients. To address the computational constraints of image-based models, we extracted four $2560 \times 2560$ pixel patches from each image and converted them into graphs. In this setting, DIFFormer achieved a balanced accuracy of $83.6 \pm 1.9$ (3-fold cross-validation), while the state-of-the-art image-based model CellViT256 reached $78.1 \pm 0.5$.

Context-aware Skin Cancer Epithelial Cell Classification with Scalable Graph Transformers

TL;DR

By evaluating several node feature configurations, it is found that the most informative representation combined morphological and texture features as well as the cell classes of non-epithelial cells, highlighting the importance of the surrounding cellular context.

Abstract

Whole-slide images (WSIs) from cancer patients contain rich information that can be used for medical diagnosis or to follow treatment progress. To automate their analysis, numerous deep learning methods based on convolutional neural networks and Vision Transformers have been developed and have achieved strong performance in segmentation and classification tasks. However, due to the large size and complex cellular organization of WSIs, these models rely on patch-based representations, losing vital tissue-level context. We propose using scalable Graph Transformers on a full-WSI cell graph for classification. We evaluate this methodology on a challenging task: the classification of healthy versus tumor epithelial cells in cutaneous squamous cell carcinoma (cSCC), where both cell types exhibit very similar morphologies and are therefore difficult to differentiate for image-based approaches. We first compared image-based and graph-based methods on a single WSI. Graph Transformer models SGFormer and DIFFormer achieved balanced accuracies of ( standard error) and in 3-fold cross-validation, respectively, whereas the best image-based method reached . By evaluating several node feature configurations, we found that the most informative representation combined morphological and texture features as well as the cell classes of non-epithelial cells, highlighting the importance of the surrounding cellular context. We then extended our work to train on several WSIs from several patients. To address the computational constraints of image-based models, we extracted four pixel patches from each image and converted them into graphs. In this setting, DIFFormer achieved a balanced accuracy of (3-fold cross-validation), while the state-of-the-art image-based model CellViT256 reached .
Paper Structure (19 sections, 7 equations, 2 figures, 5 tables)

This paper contains 19 sections, 7 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Description of WSI-Graph.(a) Steps to generate WSI-Graph. First a WSI from cSCC patient is segmented using SCC Hovernet and tumor regions are annotated by an expert to refine segmentation. From this segmentation map we build a graph, simplify it around anchor nodes and optionally split it with K-means on centroid coordinate features. (b) Zooming into the graph, edges generated with threshold distance $r_0=50\operatorname{pixels}$ corresponding to $r_0\approx 11.5µm$ are shown. (c) Number of edges, nodes, node features and instances of given cell classes before and after simplification (here $k=3$ max-hops simplification).
  • Figure 2: Description of TILE-Graphs.(a) Steps to generate TILE-Graphs. First, patches from cSCC patient sample extracted from tumor epithelial and healthy epithelial regions are segmented using SCC Hovernet. Then from these segmentation maps 372 graphs are built. (b) TILE-Graphs dataset statistics. It includes 372 patches from 93 samples from 84 patients. The resulting 372 graphs are then split keeping graphs of the same patients in the same split during cross-validation.