Table of Contents
Fetching ...

From Histopathology Images to Cell Clouds: Learning Slide Representations with Hierarchical Cell Transformer

Zijiang Yang, Zhongwei Qiu, Tiancheng Lin, Hanqing Chao, Wanxing Chang, Yelin Yang, Yunshuo Zhang, Wenpei Jiao, Yixuan Shen, Wenbin Liu, Dongmei Fu, Dakai Jin, Ke Yan, Le Lu, Hui Jiang, Yun Bian

TL;DR

The paper tackles the problem of analyzing whole-slide histopathology images directly through cell spatial distributions, addressing the lack of cell-level annotations with a large-scale dataset, WSI-Cell5B, and a weakly supervised refinement workflow. It introduces CCFormer, a hierarchical Cell Cloud Transformer that uses Neighboring Information Embedding (NIE) and Hierarchical Spatial Perception (HSP) to model cell clouds across a slide, achieving competitive or state-of-the-art performance on survival prediction and cancer staging. Clinical insights are demonstrated via CPS and MCPS metrics, showing prognostic value in cell-type proportions and spatial patterns. This work provides a scalable, cell-centric representation for WSIs and demonstrates practical clinical utility for prognosis and staging, with potential for finer-grained cell typing in future work.

Abstract

It is clinically crucial and potentially very beneficial to be able to analyze and model directly the spatial distributions of cells in histopathology whole slide images (WSI). However, most existing WSI datasets lack cell-level annotations, owing to the extremely high cost over giga-pixel images. Thus, it remains an open question whether deep learning models can directly and effectively analyze WSIs from the semantic aspect of cell distributions. In this work, we construct a large-scale WSI dataset with more than 5 billion cell-level annotations, termed WSI-Cell5B, and a novel hierarchical Cell Cloud Transformer (CCFormer) to tackle these challenges. WSI-Cell5B is based on 6,998 WSIs of 11 cancers from The Cancer Genome Atlas Program, and all WSIs are annotated per cell by coordinates and types. To the best of our knowledge, WSI-Cell5B is the first WSI-level large-scale dataset integrating cell-level annotations. On the other hand, CCFormer formulates the collection of cells in each WSI as a cell cloud and models cell spatial distribution. Specifically, Neighboring Information Embedding (NIE) is proposed to characterize the distribution of cells within the neighborhood of each cell, and a novel Hierarchical Spatial Perception (HSP) module is proposed to learn the spatial relationship among cells in a bottom-up manner. The clinical analysis indicates that WSI-Cell5B can be used to design clinical evaluation metrics based on counting cells that effectively assess the survival risk of patients. Extensive experiments on survival prediction and cancer staging show that learning from cell spatial distribution alone can already achieve state-of-the-art (SOTA) performance, i.e., CCFormer strongly outperforms other competing methods.

From Histopathology Images to Cell Clouds: Learning Slide Representations with Hierarchical Cell Transformer

TL;DR

The paper tackles the problem of analyzing whole-slide histopathology images directly through cell spatial distributions, addressing the lack of cell-level annotations with a large-scale dataset, WSI-Cell5B, and a weakly supervised refinement workflow. It introduces CCFormer, a hierarchical Cell Cloud Transformer that uses Neighboring Information Embedding (NIE) and Hierarchical Spatial Perception (HSP) to model cell clouds across a slide, achieving competitive or state-of-the-art performance on survival prediction and cancer staging. Clinical insights are demonstrated via CPS and MCPS metrics, showing prognostic value in cell-type proportions and spatial patterns. This work provides a scalable, cell-centric representation for WSIs and demonstrates practical clinical utility for prognosis and staging, with potential for finer-grained cell typing in future work.

Abstract

It is clinically crucial and potentially very beneficial to be able to analyze and model directly the spatial distributions of cells in histopathology whole slide images (WSI). However, most existing WSI datasets lack cell-level annotations, owing to the extremely high cost over giga-pixel images. Thus, it remains an open question whether deep learning models can directly and effectively analyze WSIs from the semantic aspect of cell distributions. In this work, we construct a large-scale WSI dataset with more than 5 billion cell-level annotations, termed WSI-Cell5B, and a novel hierarchical Cell Cloud Transformer (CCFormer) to tackle these challenges. WSI-Cell5B is based on 6,998 WSIs of 11 cancers from The Cancer Genome Atlas Program, and all WSIs are annotated per cell by coordinates and types. To the best of our knowledge, WSI-Cell5B is the first WSI-level large-scale dataset integrating cell-level annotations. On the other hand, CCFormer formulates the collection of cells in each WSI as a cell cloud and models cell spatial distribution. Specifically, Neighboring Information Embedding (NIE) is proposed to characterize the distribution of cells within the neighborhood of each cell, and a novel Hierarchical Spatial Perception (HSP) module is proposed to learn the spatial relationship among cells in a bottom-up manner. The clinical analysis indicates that WSI-Cell5B can be used to design clinical evaluation metrics based on counting cells that effectively assess the survival risk of patients. Extensive experiments on survival prediction and cancer staging show that learning from cell spatial distribution alone can already achieve state-of-the-art (SOTA) performance, i.e., CCFormer strongly outperforms other competing methods.

Paper Structure

This paper contains 25 sections, 5 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparison of histopathology datasets on the number of cells and WSIs. Our proposed WSI-Cell5B is the first WSI-level large-scale dataset integrating cell-level annotations, while existing datasets lack either cell-level annotations or WSIs for clinical endpoints.
  • Figure 3: Kaplan-Meier analyses on HNSC, KIRC and PAAD.
  • Figure 4: The pipeline and illustration of CCFormer. Given the cell point (coordinate and type) within the cell cloud, Neighboring Information Embedding supplements the statistical characteristics of its neighboring cells. Hierarchical Spatial Perception further progressively perceives and aggregates cell spatial distribution information hierarchically. Finally, the feature of cell spatial distributions across the entire WSI is applied to clinical endpoints.
  • Figure 5: Toy example of NIE. A toy point set containing three categories is generated. After extracting features via NIE, we performed K-Means clustering. The results indicate that features derived from NIE can effectively differentiate points at different locations (boundaries, core regions, and outliers).
  • Figure 6: Comparison of cancer staging with SOTA methods on (a) BLCA and (b) COADREAD in Macro-F1 ($\uparrow$).
  • ...and 4 more figures