Table of Contents
Fetching ...

Hybrid deep learning-based strategy for the hepatocellular carcinoma cancer grade classification of H&E stained liver histopathology images

Ajinkya Deshpande, Deep Gupta, Ankit Bhurane, Nisha Meshram, Sneha Singh, Petia Radeva

TL;DR

The paper addresses the challenge of accurate hepatocellular carcinoma grading from H&E histopathology slides and introduces a hybrid deep learning framework that couples pre-trained CNN feature extractors with a deep ANN classifier using selective fine-tuning. It evaluates patch-based preprocessing with color normalization and augmentation on TCGA-LIHC, KMC LiverNet, and COLON datasets, employing five-fold cross-validation and a rigorous training pipeline. The results show that hybrid models outperform base pretrained models across datasets, achieving 100% accuracy and AUC of 1.00 on TCGA with ResNet50, 96.71% accuracy with EfficientNetb3 on KMC, and 100% on COLON with several hybrids, demonstrating strong cross-domain robustness. The findings suggest that enhancing the classifier depth while selectively tuning the top feature layers yields dataset-dependent gains, with practical implications for automated HCC grading in clinical histopathology.

Abstract

Hepatocellular carcinoma (HCC) is a common type of liver cancer whose early-stage diagnosis is a common challenge, mainly due to the manual assessment of hematoxylin and eosin-stained whole slide images, which is a time-consuming process and may lead to variability in decision-making. For accurate detection of HCC, we propose a hybrid deep learning-based architecture that uses transfer learning to extract the features from pre-trained convolutional neural network (CNN) models and a classifier made up of a sequence of fully connected layers. This study uses a publicly available The Cancer Genome Atlas Hepatocellular Carcinoma (TCGA-LIHC)database (n=491) for model development and database of Kasturba Gandhi Medical College (KMC), India for validation. The pre-processing step involves patch extraction, colour normalization, and augmentation that results in 3920 patches for the TCGA dataset. The developed hybrid deep neural network consisting of a CNN-based pre-trained feature extractor and a customized artificial neural network-based classifier is trained using five-fold cross-validation. For this study, eight different state-of-the-art models are trained and tested as feature extractors for the proposed hybrid model. The proposed hybrid model with ResNet50-based feature extractor provided the sensitivity, specificity, F1-score, accuracy, and AUC of 100.00%, 100.00%, 100.00%, 100.00%, and 1.00, respectively on the TCGA database. On the KMC database, EfficientNetb3 resulted in the optimal choice of the feature extractor giving sensitivity, specificity, F1-score, accuracy, and AUC of 96.97, 98.85, 96.71, 96.71, and 0.99, respectively. The proposed hybrid models showed improvement in accuracy of 2% and 4% over the pre-trained models in TCGA-LIHC and KMC databases.

Hybrid deep learning-based strategy for the hepatocellular carcinoma cancer grade classification of H&E stained liver histopathology images

TL;DR

The paper addresses the challenge of accurate hepatocellular carcinoma grading from H&E histopathology slides and introduces a hybrid deep learning framework that couples pre-trained CNN feature extractors with a deep ANN classifier using selective fine-tuning. It evaluates patch-based preprocessing with color normalization and augmentation on TCGA-LIHC, KMC LiverNet, and COLON datasets, employing five-fold cross-validation and a rigorous training pipeline. The results show that hybrid models outperform base pretrained models across datasets, achieving 100% accuracy and AUC of 1.00 on TCGA with ResNet50, 96.71% accuracy with EfficientNetb3 on KMC, and 100% on COLON with several hybrids, demonstrating strong cross-domain robustness. The findings suggest that enhancing the classifier depth while selectively tuning the top feature layers yields dataset-dependent gains, with practical implications for automated HCC grading in clinical histopathology.

Abstract

Hepatocellular carcinoma (HCC) is a common type of liver cancer whose early-stage diagnosis is a common challenge, mainly due to the manual assessment of hematoxylin and eosin-stained whole slide images, which is a time-consuming process and may lead to variability in decision-making. For accurate detection of HCC, we propose a hybrid deep learning-based architecture that uses transfer learning to extract the features from pre-trained convolutional neural network (CNN) models and a classifier made up of a sequence of fully connected layers. This study uses a publicly available The Cancer Genome Atlas Hepatocellular Carcinoma (TCGA-LIHC)database (n=491) for model development and database of Kasturba Gandhi Medical College (KMC), India for validation. The pre-processing step involves patch extraction, colour normalization, and augmentation that results in 3920 patches for the TCGA dataset. The developed hybrid deep neural network consisting of a CNN-based pre-trained feature extractor and a customized artificial neural network-based classifier is trained using five-fold cross-validation. For this study, eight different state-of-the-art models are trained and tested as feature extractors for the proposed hybrid model. The proposed hybrid model with ResNet50-based feature extractor provided the sensitivity, specificity, F1-score, accuracy, and AUC of 100.00%, 100.00%, 100.00%, 100.00%, and 1.00, respectively on the TCGA database. On the KMC database, EfficientNetb3 resulted in the optimal choice of the feature extractor giving sensitivity, specificity, F1-score, accuracy, and AUC of 96.97, 98.85, 96.71, 96.71, and 0.99, respectively. The proposed hybrid models showed improvement in accuracy of 2% and 4% over the pre-trained models in TCGA-LIHC and KMC databases.

Paper Structure

This paper contains 14 sections, 2 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: Block diagram showing overall workflow of the experimentation
  • Figure 2: TCGA-LIHC liver HCC databasetcga. This diagram describes the preprocessing for TCGA dataset preparation, which involves patch extraction from the whole slide image, downsampling to balance the dataset, and data augmentation. After data preprocessing, the number of images obtained are mentioned at the bottom.
  • Figure 3: TCGA-LIHC, KMC and COLON datasets are comprised of 3, 4 and 2 types respectively. This figure shows demo patches available in the respective dataset.
  • Figure 4: i) Base model: All of the convolutional layers in pre-trained model are frozen and the last fully connected layer is replaced and kept trainable. ii) Hybrid model: Bottom layers of the convolutional layers of the pre-trained model are frozen. A shallow classifier is replaced with a deep classifier. The classifier and selective top layers of the pre-trained model are kept trainable.
  • Figure 5: The workflow of the training and validation of the proposed method
  • ...and 10 more figures