Table of Contents
Fetching ...

Leveraging Machine Learning and Deep Learning Techniques for Improved Pathological Staging of Prostate Cancer

Raziehsadat Ghalamkarian, Marziehsadat Ghalamkarian, MortezaAli Ahmadi, Sayed Mohammad Ahmadi, Abolfazl Diyanat

TL;DR

This work tackles the problem of improving pathological staging for prostate cancer using RNA-seq data from TCGA. It combines feature selection, feature extraction, data augmentation, and SMOTE with a range of ML/DL models, highlighting Random Forest as the top performer with an $F1$ around $83\%$ and strong cross-validation results. Deep learning experiments show that full-dimensional data with augmentation yields the best accuracy ($71.23\%$), while PCA/ICA dimen-sionality reduction offers mixed benefits. Overall, the study demonstrates AI-based methods can enhance staging accuracy and inform personalized treatment, with further validation in clinical/pathology settings and integration of additional omics data suggested as future directions.

Abstract

Prostate cancer (Pca) continues to be a leading cause of cancer-related mortality in men, and the limitations in precision of traditional diagnostic methods such as the Digital Rectal Exam (DRE), Prostate-Specific Antigen (PSA) testing, and biopsies underscore the critical importance of accurate staging detection in enhancing treatment outcomes and improving patient prognosis. This study leverages machine learning and deep learning approaches, along with feature selection and extraction methods, to enhance PCa pathological staging predictions using RNA sequencing data from The Cancer Genome Atlas (TCGA). Gene expression profiles from 486 tumors were analyzed using advanced algorithms, including Random Forest (RF), Logistic Regression (LR), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM). The performance of the study is measured with respect to the F1-score, as well as precision and recall, all of which are calculated as weighted averages. The results reveal that the highest test F1-score, approximately 83%, was achieved by the Random Forest algorithm, followed by Logistic Regression at 80%, while both Extreme Gradient Boosting (XGB) and Support Vector Machine (SVM) scored around 79%. Furthermore, deep learning models with data augmentation achieved an accuracy of 71. 23%, while PCA-based dimensionality reduction reached an accuracy of 69.86%. This research highlights the potential of AI-driven approaches in clinical oncology, paving the way for more reliable diagnostic tools that can ultimately improve patient outcomes.

Leveraging Machine Learning and Deep Learning Techniques for Improved Pathological Staging of Prostate Cancer

TL;DR

This work tackles the problem of improving pathological staging for prostate cancer using RNA-seq data from TCGA. It combines feature selection, feature extraction, data augmentation, and SMOTE with a range of ML/DL models, highlighting Random Forest as the top performer with an around and strong cross-validation results. Deep learning experiments show that full-dimensional data with augmentation yields the best accuracy (), while PCA/ICA dimen-sionality reduction offers mixed benefits. Overall, the study demonstrates AI-based methods can enhance staging accuracy and inform personalized treatment, with further validation in clinical/pathology settings and integration of additional omics data suggested as future directions.

Abstract

Prostate cancer (Pca) continues to be a leading cause of cancer-related mortality in men, and the limitations in precision of traditional diagnostic methods such as the Digital Rectal Exam (DRE), Prostate-Specific Antigen (PSA) testing, and biopsies underscore the critical importance of accurate staging detection in enhancing treatment outcomes and improving patient prognosis. This study leverages machine learning and deep learning approaches, along with feature selection and extraction methods, to enhance PCa pathological staging predictions using RNA sequencing data from The Cancer Genome Atlas (TCGA). Gene expression profiles from 486 tumors were analyzed using advanced algorithms, including Random Forest (RF), Logistic Regression (LR), Extreme Gradient Boosting (XGB), and Support Vector Machine (SVM). The performance of the study is measured with respect to the F1-score, as well as precision and recall, all of which are calculated as weighted averages. The results reveal that the highest test F1-score, approximately 83%, was achieved by the Random Forest algorithm, followed by Logistic Regression at 80%, while both Extreme Gradient Boosting (XGB) and Support Vector Machine (SVM) scored around 79%. Furthermore, deep learning models with data augmentation achieved an accuracy of 71. 23%, while PCA-based dimensionality reduction reached an accuracy of 69.86%. This research highlights the potential of AI-driven approaches in clinical oncology, paving the way for more reliable diagnostic tools that can ultimately improve patient outcomes.

Paper Structure

This paper contains 44 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Three methods of prostate cancer staging: the Prostate-specific Antigen (PSA), Gleason Score, and the AJCC TNM system, where AJCC stands for the American Joint Committee on Cancer. This system categorizes the cancer based on Tumor (T) size and extent, Nodes (N) for lymph node involvement, and Metastasis (M) for the presence of distant spread.
  • Figure 2: Overview of the strategy employed in this paper. Data was obtained from the TCGA data portal, where gene expression values from tumors served as the descriptor variables. Class labels were derived from the clinical details provided in the Biotab section of the TCGA data portal. The processed data was then used to create training and testing datasets. Feature selection and extraction techniques were applied, and classification models were developed using the scikit-learn library in Python.
  • Figure 3: This Volcano plot visualizes differential gene expression, highlighting up-regulated, down-regulated, and non-significant genes.
  • Figure 4: Evaluation of test results for model performance with and without feature selection.
  • Figure 5: Assessment of model performance with and without feature selection utilizing 10-fold cross-validation.
  • ...and 1 more figures