Predicting Survivability of Cancer Patients with Metastatic Patterns Using Explainable AI
Polycarp Nalela, Deepthi Rao, Praveen Rao
TL;DR
This study addresses prognostic prediction for cancer patients with metastatic patterns by leveraging the MSK-MET pan-cancer dataset. It compares five ML models, finds XGBoost to be the best performer with an AUC of $0.82$, and uses SHAP to reveal interpretable predictors such as metastatic site count, tumor mutation burden, and fraction of genome altered. The authors extend the analysis with survival modeling (Kaplan-Meier, Cox PH, and XGBoost Survival Analysis) to connect predictions with time-to-event outcomes, demonstrating clinically actionable insights and cancer-type–specific differences. The work emphasizes explainability and the potential for personalized prognosis and treatment planning, highlighting both global patterns and cancer-specific nuances. Overall, the integrated predictive and survival framework offers a robust, interpretable tool to improve patient care in metastatic cancer settings.
Abstract
Cancer remains a leading global health challenge and a major cause of mortality. This study leverages machine learning (ML) to predict the survivability of cancer patients with metastatic patterns using the comprehensive MSK-MET dataset, which includes genomic and clinical data from 25,775 patients across 27 cancer types. We evaluated five ML models-XGBoost, Naïve Bayes, Decision Tree, Logistic Regression, and Random Fores using hyperparameter tuning and grid search. XGBoost emerged as the best performer with an area under the curve (AUC) of 0.82. To enhance model interpretability, SHapley Additive exPlanations (SHAP) were applied, revealing key predictors such as metastatic site count, tumor mutation burden, fraction of genome altered, and organ-specific metastases. Further survival analysis using Kaplan-Meier curves, Cox Proportional Hazards models, and XGBoost Survival Analysis identified significant predictors of patient outcomes, offering actionable insights for clinicians. These findings could aid in personalized prognosis and treatment planning, ultimately improving patient care.
