Cervical Cancer Detection Using Multi-Branch Deep Learning Model
Tatsuhiro Baba, Abu Saleh Musa Miah, Jungpil Shin, Md. Al Mehedi Hasan
TL;DR
The paper tackles automated cervical cancer detection from Pap smear images by introducing a CNN-MHSA hybrid architecture that combines a Grain Module with a MHSA-based stream and a CNN module, followed by an IRFFN-like classification head. The approach leverages both local texture via CNNs and global context via MHSA, enhanced by a lightweight attention design and patch-based processing. On the SIPaKMeD dataset, the model achieves $98.522\%$ accuracy, outperforming several state-of-the-art baselines and demonstrating strong potential for broader medical image recognition tasks. The work highlights a scalable pathway toward fast, accurate, and automated cervical cancer screening with implications for clinical deployment and future research in multimodal medical image analysis.
Abstract
Cervical cancer is a crucial global health concern for women, and the persistent infection of High-risk HPV mainly triggers this remains a global health challenge, with young women diagnosis rates soaring from 10\% to 40\% over three decades. While Pap smear screening is a prevalent diagnostic method, visual image analysis can be lengthy and often leads to mistakes. Early detection of the disease can contribute significantly to improving patient outcomes. In recent decades, many researchers have employed machine learning techniques that achieved promise in cervical cancer detection processes based on medical images. In recent years, many researchers have employed various deep-learning techniques to achieve high-performance accuracy in detecting cervical cancer but are still facing various challenges. This research proposes an innovative and novel approach to automate cervical cancer image classification using Multi-Head Self-Attention (MHSA) and convolutional neural networks (CNNs). The proposed method leverages the strengths of both MHSA mechanisms and CNN to effectively capture both local and global features within cervical images in two streams. MHSA facilitates the model's ability to focus on relevant regions of interest, while CNN extracts hierarchical features that contribute to accurate classification. Finally, we combined the two stream features and fed them into the classification module to refine the feature and the classification. To evaluate the performance of the proposed approach, we used the SIPaKMeD dataset, which classifies cervical cells into five categories. Our model achieved a remarkable accuracy of 98.522\%. This performance has high recognition accuracy of medical image classification and holds promise for its applicability in other medical image recognition tasks.
