Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data
Amin Honarmandi Shandiz
TL;DR
This work addresses breast cancer molecular subtyping by integrating histopathology images with gene expression data through a deep multimodal framework. It employs a two-branch architecture with ResNet-50 image features and a dense gene encoder, fused via cross-attention to classify BRCA.Luminal vs BRCA.Basal/Her2. Empirical results show gene expression alone is strong, images alone are weak, and cross-attention multimodal fusion achieves the best performance (F1=0.9379, PR-AUC=0.9948), outperforming concatenation and late fusion. The findings underscore the promise of multimodal deep learning for robust, clinically relevant cancer subtyping, while acknowledging dataset size limitations and the need for external validation and further methodological development.
Abstract
Molecular subtyping of breast cancer is crucial for personalized treatment and prognosis. Traditional classification approaches rely on either histopathological images or gene expression profiling, limiting their predictive power. In this study, we propose a deep multimodal learning framework that integrates histopathological images and gene expression data to classify breast cancer into BRCA.Luminal and BRCA.Basal / Her2 subtypes. Our approach employs a ResNet-50 model for image feature extraction and fully connected layers for gene expression processing, with a cross-attention fusion mechanism to enhance modality interaction. We conduct extensive experiments using five-fold cross-validation, demonstrating that our multimodal integration outperforms unimodal approaches in terms of classification accuracy, precision-recall AUC, and F1-score. Our findings highlight the potential of deep learning for robust and interpretable breast cancer subtype classification, paving the way for improved clinical decision-making.
