Leveraging Machine Learning and Deep Learning Techniques for Improved Pathological Staging of Prostate Cancer
Raziehsadat Ghalamkarian, Marziehsadat Ghalamkarian, MortezaAli Ahmadi, Sayed Mohammad Ahmadi, Abolfazl Diyanat
TL;DR
This work tackles the problem of improving pathological staging for prostate cancer using RNA-seq data from TCGA. It combines feature selection, feature extraction, data augmentation, and SMOTE with a range of ML/DL models, highlighting Random Forest as the top performer with an $F1$ around $83\%$ and strong cross-validation results. Deep learning experiments show that full-dimensional data with augmentation yields the best accuracy ($71.23\%$), while PCA/ICA dimen-sionality reduction offers mixed benefits. Overall, the study demonstrates AI-based methods can enhance staging accuracy and inform personalized treatment, with further validation in clinical/pathology settings and integration of additional omics data suggested as future directions.
Abstract
Prostate cancer (Pca) continues to be a leading cause of cancer-related mortality in men, and the limitations in precision of traditional diagnostic methods such as the Digital Rectal Exam (DRE), Prostate-Specific Antigen (PSA) testing, and biopsies underscore the critical importance of accurate staging detection in enhancing treatment outcomes and improving patient prognosis. This study leverages machine learning and deep learning approaches, along with feature selection and extraction methods, to enhance PCa pathological staging predictions using RNA sequencing data from The Cancer Genome Atlas (TCGA). Gene expression profiles from 486 tumors were analyzed using advanced algorithms, including Random Forest (RF), Logistic Regression (LR), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM). The performance of the study is measured with respect to the F1-score, as well as precision and recall, all of which are calculated as weighted averages. The results reveal that the highest test F1-score, approximately 83%, was achieved by the Random Forest algorithm, followed by Logistic Regression at 80%, while both Extreme Gradient Boosting (XGB) and Support Vector Machine (SVM) scored around 79%. Furthermore, deep learning models with data augmentation achieved an accuracy of 71. 23%, while PCA-based dimensionality reduction reached an accuracy of 69.86%. This research highlights the potential of AI-driven approaches in clinical oncology, paving the way for more reliable diagnostic tools that can ultimately improve patient outcomes.
