Table of Contents
Fetching ...

Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

Andrey Ignatov, Josephine Yates, Valentina Boeva

TL;DR

The paper tackles the data scarcity challenge in histopathological image classification by introducing DeepCMorph, a two-module network that explicitly learns cell morphology. The segmentation module jointly performs nuclei segmentation and cell-type annotation, while the classification module uses the segmentation outputs alongside the raw image for tissue classification, enabling effective transfer from large-scale pretraining on PanCancer TCGA to smaller datasets. Experimental results demonstrate state-of-the-art performance on PanCancer TCGA (82.7% accuracy) and strong generalization to NCT-CRC-HE, CRC8, and UniToPatho datasets, with notable robustness to batch effects due to extreme data augmentation. The fully convolutional design handles arbitrary image sizes, and the authors release code and pretrained models to facilitate broader adoption of cell-morphology-aware histopathology analysis.

Abstract

Histopathological images are widely used for the analysis of diseased (tumor) tissues and patient treatment selection. While the majority of microscopy image processing was previously done manually by pathologists, recent advances in computer vision allow for accurate recognition of lesion regions with deep learning-based solutions. Such models, however, usually require extensive annotated datasets for training, which is often not the case in the considered task, where the number of available patient data samples is very limited. To deal with this problem, we propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types. The model consists of two modules: the first one performs cell nuclei segmentation and annotates each cell type, and is trained on a combination of 8 publicly available datasets to ensure its high generalizability and robustness. The second module combines the obtained segmentation map with the original microscopy image and is trained for the downstream task. We pre-trained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients. The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%. We demonstrate that the resulting pre-trained model can be easily fine-tuned on smaller microscopy datasets, yielding superior results compared to the current top solutions and models initialized with ImageNet weights. The codes and pre-trained models presented in this paper are available at: https://github.com/aiff22/DeepCMorph

Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

TL;DR

The paper tackles the data scarcity challenge in histopathological image classification by introducing DeepCMorph, a two-module network that explicitly learns cell morphology. The segmentation module jointly performs nuclei segmentation and cell-type annotation, while the classification module uses the segmentation outputs alongside the raw image for tissue classification, enabling effective transfer from large-scale pretraining on PanCancer TCGA to smaller datasets. Experimental results demonstrate state-of-the-art performance on PanCancer TCGA (82.7% accuracy) and strong generalization to NCT-CRC-HE, CRC8, and UniToPatho datasets, with notable robustness to batch effects due to extreme data augmentation. The fully convolutional design handles arbitrary image sizes, and the authors release code and pretrained models to facilitate broader adoption of cell-morphology-aware histopathology analysis.

Abstract

Histopathological images are widely used for the analysis of diseased (tumor) tissues and patient treatment selection. While the majority of microscopy image processing was previously done manually by pathologists, recent advances in computer vision allow for accurate recognition of lesion regions with deep learning-based solutions. Such models, however, usually require extensive annotated datasets for training, which is often not the case in the considered task, where the number of available patient data samples is very limited. To deal with this problem, we propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types. The model consists of two modules: the first one performs cell nuclei segmentation and annotates each cell type, and is trained on a combination of 8 publicly available datasets to ensure its high generalizability and robustness. The second module combines the obtained segmentation map with the original microscopy image and is trained for the downstream task. We pre-trained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients. The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%. We demonstrate that the resulting pre-trained model can be easily fine-tuned on smaller microscopy datasets, yielding superior results compared to the current top solutions and models initialized with ImageNet weights. The codes and pre-trained models presented in this paper are available at: https://github.com/aiff22/DeepCMorph
Paper Structure (16 sections, 2 equations, 4 figures, 6 tables)

This paper contains 16 sections, 2 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the proposed DeepCMorph network architecture. The model consists of two separate modules: the first one performs nuclei segmentation and cell type annotation. Its outputs are then stacked together with the original histopathology image and are passed to the second module performing the final classification task.
  • Figure 2: Sample images used for training the segmentation DeepCMorph module. Top row -- original H&E stained image patches, middle row -- target nuclei segmentation maps, bottom row -- cell annotation maps. For latter, red color encodes lymphocytes, green: epithelial cells, blue: plasma cells, orange: neutrophils, magenta: eosinophils, yellow: connective tissue.
  • Figure 3: Sample H&E stained image patches for 32 different cancer types from the Pan Cancer TCGA dataset komura2022universal.
  • Figure 4: The original image patch (top left, denoted by blue frame) and training patches generated by the proposed data augmentations.