Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks
Andrey Ignatov, Josephine Yates, Valentina Boeva
TL;DR
The paper tackles the data scarcity challenge in histopathological image classification by introducing DeepCMorph, a two-module network that explicitly learns cell morphology. The segmentation module jointly performs nuclei segmentation and cell-type annotation, while the classification module uses the segmentation outputs alongside the raw image for tissue classification, enabling effective transfer from large-scale pretraining on PanCancer TCGA to smaller datasets. Experimental results demonstrate state-of-the-art performance on PanCancer TCGA (82.7% accuracy) and strong generalization to NCT-CRC-HE, CRC8, and UniToPatho datasets, with notable robustness to batch effects due to extreme data augmentation. The fully convolutional design handles arbitrary image sizes, and the authors release code and pretrained models to facilitate broader adoption of cell-morphology-aware histopathology analysis.
Abstract
Histopathological images are widely used for the analysis of diseased (tumor) tissues and patient treatment selection. While the majority of microscopy image processing was previously done manually by pathologists, recent advances in computer vision allow for accurate recognition of lesion regions with deep learning-based solutions. Such models, however, usually require extensive annotated datasets for training, which is often not the case in the considered task, where the number of available patient data samples is very limited. To deal with this problem, we propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types. The model consists of two modules: the first one performs cell nuclei segmentation and annotates each cell type, and is trained on a combination of 8 publicly available datasets to ensure its high generalizability and robustness. The second module combines the obtained segmentation map with the original microscopy image and is trained for the downstream task. We pre-trained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients. The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%. We demonstrate that the resulting pre-trained model can be easily fine-tuned on smaller microscopy datasets, yielding superior results compared to the current top solutions and models initialized with ImageNet weights. The codes and pre-trained models presented in this paper are available at: https://github.com/aiff22/DeepCMorph
