Table of Contents
Fetching ...

Morpho-Genomic Deep Learning for Ovarian Cancer Subtype and Gene Mutation Prediction from Histopathology

Gabriela Fernandes

TL;DR

Ovarian cancer's heterogeneity complicates diagnosis and treatment; this paper addresses this by predicting subtypes and gene mutations from H&E images. It introduces a morpho-genomic pipeline that fuses a ResNet-50 encoder and a Vision Transformer with handcrafted nuclear morphometry, trained on about 45,000 patches from TCGA-OV and Kaggle. The model achieves 84.2% subtype accuracy and AUCs of 0.82 for TP53, 0.76 for BRCA1, and 0.73 for ARID1A; morphometry alone fails to predict TP53, underscoring the value of deep features. Feature importance links nuclear solidity and eccentricity to TP53 and shows other morphometric cues for BRCA1/ARID1A, with Grad-CAM supporting interpretability. These results point to a cost-effective path for precision histopathology and molecular prescreening, enabling faster triage and informed sequencing decisions.

Abstract

Ovarian cancer remains one of the most lethal gynecological malignancies, largely due to late diagnosis and extensive heterogeneity across subtypes. Current diagnostic methods are limited in their ability to reveal underlying genomic variations essential for precision oncology. This study introduces a novel hybrid deep learning pipeline that integrates quantitative nuclear morphometry with deep convolutional image features to perform ovarian cancer subtype classification and gene mutation inference directly from Hematoxylin and Eosin (H&E) histopathological images. Using $\sim45,000$ image patches sourced from The Cancer Genome Atlas (TCGA) and public datasets, a fusion model combining a ResNet-50 Convolutional Neural Network (CNN) encoder and a Vision Transformer (ViT) was developed. This model successfully captured both local morphological texture and global tissue context. The pipeline achieved a robust overall subtype classification accuracy of $84.2\%$ (Macro AUC of $0.87 \pm 0.03$). Crucially, the model demonstrated the capacity for gene mutation inference with moderate-to-high accuracy: $AUC_{TP53} = 0.82 \pm 0.02$, $AUC_{BRCA1} = 0.76 \pm 0.04$, and $AUC_{ARID1A} = 0.73 \pm 0.05$. Feature importance analysis established direct quantitative links, revealing that nuclear solidity and eccentricity were the dominant predictors for TP53 mutation. These findings validate that quantifiable histological phenotypes encode measurable genomic signals, paving the way for cost-effective, precision histopathology in ovarian cancer triage and diagnosis.

Morpho-Genomic Deep Learning for Ovarian Cancer Subtype and Gene Mutation Prediction from Histopathology

TL;DR

Ovarian cancer's heterogeneity complicates diagnosis and treatment; this paper addresses this by predicting subtypes and gene mutations from H&E images. It introduces a morpho-genomic pipeline that fuses a ResNet-50 encoder and a Vision Transformer with handcrafted nuclear morphometry, trained on about 45,000 patches from TCGA-OV and Kaggle. The model achieves 84.2% subtype accuracy and AUCs of 0.82 for TP53, 0.76 for BRCA1, and 0.73 for ARID1A; morphometry alone fails to predict TP53, underscoring the value of deep features. Feature importance links nuclear solidity and eccentricity to TP53 and shows other morphometric cues for BRCA1/ARID1A, with Grad-CAM supporting interpretability. These results point to a cost-effective path for precision histopathology and molecular prescreening, enabling faster triage and informed sequencing decisions.

Abstract

Ovarian cancer remains one of the most lethal gynecological malignancies, largely due to late diagnosis and extensive heterogeneity across subtypes. Current diagnostic methods are limited in their ability to reveal underlying genomic variations essential for precision oncology. This study introduces a novel hybrid deep learning pipeline that integrates quantitative nuclear morphometry with deep convolutional image features to perform ovarian cancer subtype classification and gene mutation inference directly from Hematoxylin and Eosin (H&E) histopathological images. Using image patches sourced from The Cancer Genome Atlas (TCGA) and public datasets, a fusion model combining a ResNet-50 Convolutional Neural Network (CNN) encoder and a Vision Transformer (ViT) was developed. This model successfully captured both local morphological texture and global tissue context. The pipeline achieved a robust overall subtype classification accuracy of (Macro AUC of ). Crucially, the model demonstrated the capacity for gene mutation inference with moderate-to-high accuracy: , , and . Feature importance analysis established direct quantitative links, revealing that nuclear solidity and eccentricity were the dominant predictors for TP53 mutation. These findings validate that quantifiable histological phenotypes encode measurable genomic signals, paving the way for cost-effective, precision histopathology in ovarian cancer triage and diagnosis.

Paper Structure

This paper contains 16 sections, 13 figures.

Figures (13)

  • Figure 1: Nuclear Area Distribution by Subtype. Violin plot illustrating the range and density of nuclear area measurements across different ovarian cancer subtypes. Highlights statistically significant differences in nuclear size variation (pleomorphism), particularly in high-grade lesions.
  • Figure 2: Feature Correlation Heatmap. Heatmap showing the internal correlation between the extracted handcrafted nuclear morphometric features (Area, Perimeter, Eccentricity, Solidity, Mean Intensity, etc.). Confirms expected relationships, such as the strong correlation between Area and Perimeter.
  • Figure 3: PCA of Deep Morphometric Features. Principal Component Analysis (PCA) scatter plot demonstrating the clustering of ovarian cancer subtypes in the morphometric feature space. Shows that nuclear geometry alone provides substantial discriminative power.
  • Figure 4: Confusion Matrix for Subtype Classification. Detailed performance of the hybrid CNN-ViT model in classifying the four major ovarian cancer subtypes. High values on the diagonal indicate successful classification, while off-diagonal values show instances of misclassification, particularly between Endometrioid and Clear-Cell subtypes.
  • Figure 5: Top Morphometric Predictors of Subtype. Feature importance scores for the handcrafted nuclear features in predicting histological subtype. Identifies Mean Intensity, Area, and Perimeter as the most salient features.
  • ...and 8 more figures