Table of Contents
Fetching ...

Immunocto: a massive immune cell database auto-generated for histopathology

Mikaël Simard, Zhuoyan Shen, Konstantin Bräutigam, Rasha Abu-Eid, Maria A. Hawkins, Charles-Antoine Collins-Fekete

TL;DR

Immunocto addresses the challenge of scalable immune cell labeling on histopathology by auto-generating a large, multi-immune cell database from co-registered H&E and multiplexed IF data using SAM for segmentation and a supervised classifier. The authors demonstrate state-of-the-art lymphocyte detection when training on Immunocto-derived labels and show that subtyping of CD4+ T cells, CD8+ T cells, CD20+ B cells, and macrophages is feasible from H&E with IF-guided ground truth. They validate data quality through SAM segmentation benchmarks, IF-space label separability, and expert review, and compare H&E classifiers trained on Immunocto with baselines to illustrate improved performance and generalization. This openly available resource enables rapid development and benchmarking of computational pathology methods to study TIME and predict immunotherapy response on routine H&E slides, with potential to scale to additional cell types and cancers.

Abstract

With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment (TIME) is crucial to inform on prognosis and understand potential response to therapeutic agents. A key approach to characterising the TIME may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. In this work, we introduce a workflow to automatically generate robust single cell contours and labels from dually stained tissue sections with H&E and multiplexed immunofluorescence (IF) markers. The approach harnesses the Segment Anything Model and requires minimal human intervention compared to existing single cell databases. With this methodology, we create Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells and objects, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, CD20$^+$ B cell lymphocytes, and CD68$^+$/CD163$^+$ macrophages. For each cell, we provide a 64$\times$64 pixels$^2$ H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. The database, which is made publicly available, can be used to train models to study the TIME on routine H&E slides. We show that deep learning models trained on Immunocto result in state-of-the-art performance for lymphocyte detection. The approach demonstrates the benefits of using matched H&E and IF data to generate robust databases for computational pathology applications.

Immunocto: a massive immune cell database auto-generated for histopathology

TL;DR

Immunocto addresses the challenge of scalable immune cell labeling on histopathology by auto-generating a large, multi-immune cell database from co-registered H&E and multiplexed IF data using SAM for segmentation and a supervised classifier. The authors demonstrate state-of-the-art lymphocyte detection when training on Immunocto-derived labels and show that subtyping of CD4+ T cells, CD8+ T cells, CD20+ B cells, and macrophages is feasible from H&E with IF-guided ground truth. They validate data quality through SAM segmentation benchmarks, IF-space label separability, and expert review, and compare H&E classifiers trained on Immunocto with baselines to illustrate improved performance and generalization. This openly available resource enables rapid development and benchmarking of computational pathology methods to study TIME and predict immunotherapy response on routine H&E slides, with potential to scale to additional cell types and cancers.

Abstract

With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment (TIME) is crucial to inform on prognosis and understand potential response to therapeutic agents. A key approach to characterising the TIME may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. In this work, we introduce a workflow to automatically generate robust single cell contours and labels from dually stained tissue sections with H&E and multiplexed immunofluorescence (IF) markers. The approach harnesses the Segment Anything Model and requires minimal human intervention compared to existing single cell databases. With this methodology, we create Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells and objects, including 2,282,818 immune cells distributed across 4 subtypes: CD4 T cell lymphocytes, CD8 T cell lymphocytes, CD20 B cell lymphocytes, and CD68/CD163 macrophages. For each cell, we provide a 6464 pixels H&E image at magnification, along with a binary mask of the nucleus and a label. The database, which is made publicly available, can be used to train models to study the TIME on routine H&E slides. We show that deep learning models trained on Immunocto result in state-of-the-art performance for lymphocyte detection. The approach demonstrates the benefits of using matched H&E and IF data to generate robust databases for computational pathology applications.
Paper Structure (31 sections, 1 equation, 6 figures, 4 tables)

This paper contains 31 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Workflow to create the Immunocto database.
  • Figure 2: (a) Iterative thresholding scheme. Given starting parameters $q$, $C$ and $L$, one can extract $L$ candidate cells that are potentially active in immunofluorescence channels $c_1, ..., c_K$. (b) Successive applications of the iterative scheme to extract candidate immune cell datasets. All applications of the iterative scheme use $L=5000$ and $q=99$.
  • Figure 3: Examples of the segmentation performance of various segmentation models on the Lizard test set for the identification of lymphocytes only.
  • Figure 4: Differentiation of immune cell subtypes in the Immunocto database based on average IF intensities within cells. For each panel, the compared subtypes are identified in the legend, and the axes $I_{c}$ represent the average intensity of IF channel $c$ inside a cell. Decision boundaries issued from a logistic regression separating the two subtypes are also shown. Only 10% of the cells (randomly sampled) are shown for each subtype.
  • Figure 5: Confusion matrices showing the agreement between the Immunocto $V_1$ labels and two raters, for each label type. The uncertain label denotes examples where raters could not clearly identify a subtype.
  • ...and 1 more figures