Deep Ensembling with Multimodal Image Fusion for Efficient Classification of Lung Cancer
Surochita Pal, Sushmita Mitra
TL;DR
This work tackles lung cancer classification from multimodal PET–CT slices under limited data. It introduces the DEMF framework, which pairs $PCA$-based fusion (PCAE) of CT and PET with a majority-voting ensemble of pretrained CNNs, and uses Grad-CAM for interpretability. Ablation studies and Grad-CAM analyses show that PCAE fusion and the ensemble approach outperform single-modality baselines and individual models, providing robust performance across three public datasets. The results demonstrate strong accuracy and interpretability, suggesting practical utility for efficient multimodal medical image analysis in data-scarce settings and potential extension to other diseases.
Abstract
This study focuses on the classification of cancerous and healthy slices from multimodal lung images. The data used in the research comprises Computed Tomography (CT) and Positron Emission Tomography (PET) images. The proposed strategy achieves the fusion of PET and CT images by utilizing Principal Component Analysis (PCA) and an Autoencoder. Subsequently, a new ensemble-based classifier developed, Deep Ensembled Multimodal Fusion (DEMF), employing majority voting to classify the sample images under examination. Gradient-weighted Class Activation Mapping (Grad-CAM) employed to visualize the classification accuracy of cancer-affected images. Given the limited sample size, a random image augmentation strategy employed during the training phase. The DEMF network helps mitigate the challenges of scarce data in computer-aided medical image analysis. The proposed network compared with state-of-the-art networks across three publicly available datasets. The network outperforms others based on the metrics - Accuracy, F1-Score, Precision, and Recall. The investigation results highlight the effectiveness of the proposed network.
