Explainable machine learning for neoplasms diagnosis via electrocardiograms: an externally validated study
Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp, Nils Strodthoff
TL;DR
This work tackles the need for non-invasive, accessible neoplasm diagnostics by leveraging ECG-derived features combined with tree-based machine learning and SHAP explainability. Using internal MIMIC-IV-ECG data and external ECG-VIEW-II validation, the study trains separate XGBoost classifiers for multiple ICD-10-CM neoplasm codes, achieving strong discrimination and calibration across cohorts. SHAP analyses reveal age and key ECG intervals as important predictors, providing interpretable insights into cardio-neoplasm interactions and potential biomarkers. The results demonstrate robust external validity and hold promise for scalable deployment in resource-limited settings, contributing to cardio-oncology by linking electrical cardiac signals with oncologic diagnoses.
Abstract
Background: Neoplasms are a major cause of mortality globally, where early diagnosis is essential for improving outcomes. Current diagnostic methods are often invasive, expensive, and inaccessible in resource-limited settings. This study explores the potential of electrocardiogram (ECG) data, a widely available and non-invasive tool for diagnosing neoplasms through cardiovascular changes linked to neoplastic presence. Methods: A diagnostic pipeline combining tree-based machine learning models with Shapley value analysis for explainability was developed. The model was trained and internally validated on a large dataset and externally validated on an independent cohort to ensure robustness and generalizability. Key ECG features contributing to predictions were identified and analyzed. Results: The model achieved high diagnostic accuracy in both internal testing and external validation cohorts. Shapley value analysis highlighted significant ECG features, including novel predictors. The approach is cost-effective, scalable, and suitable for resource-limited settings, offering insights into cardiovascular changes associated with neoplasms and their therapies. Conclusions: This study demonstrates the feasibility of using ECG signals and machine learning for non-invasive neoplasm diagnosis. By providing interpretable insights into cardio-neoplasm interactions, this method addresses gaps in diagnostics and supports integration into broader diagnostic and therapeutic frameworks.
