Table of Contents
Fetching ...

CardioForest: An Explainable Ensemble Learning Model for Automatic Wide QRS Complex Tachycardia Diagnosis from ECG

Vaskar Chakma, Ju Xiaolin, Heling Cao, Xue Feng, Ji Xiaodong, Pan Haiyan, Gao Zhan

TL;DR

This work targets automatic WCT detection from ECG with an explainable ensemble approach. CardioForest, a Random Forest–based framework augmented by gradient-boosting techniques, demonstrates high diagnostic accuracy (mean accuracy $=0.9519$) and robust calibration when tested on the MIMIC-IV-ECG dataset, while SHAP explanations reveal clinically meaningful feature contributions, notably QRS duration. The study rigorously compares CardioForest against GBM, XGBoost, and LightGBM using 10-fold cross-validation, pairwise statistical tests, and stability analyses, underscoring not only performance but also interpretability and reliability in clinical settings. The results suggest that CardioForest can deliver real-time, transparent WCT diagnostics that align with clinical reasoning, with calibrated uncertainty enabling targeted expert review for ambiguous cases. The work also delineates deployment considerations, calibration, and future directions for broader arrhythmia detection and integration with deep learning insights.

Abstract

This study aims to develop and evaluate an ensemble machine learning-based framework for the automatic detection of Wide QRS Complex Tachycardia (WCT) from ECG signals, emphasizing diagnostic accuracy and interpretability using Explainable AI. The proposed system integrates ensemble learning techniques, i.e., an optimized Random Forest known as CardioForest, and models like XGBoost and LightGBM. The models were trained and tested on ECG data from the publicly available MIMIC-IV dataset. The testing was carried out with the assistance of accuracy, balanced accuracy, precision, recall, F1 score, ROC-AUC, and error rate (RMSE, MAE) measures. In addition, SHAP (SHapley Additive exPlanations) was used to ascertain model explainability and clinical relevance. The CardioForest model performed best on all metrics, achieving a test accuracy of 95.19%, a balanced accuracy of 88.76%, a precision of 95.26%, a recall of 78.42%, and an ROC-AUC of 0.8886. SHAP analysis confirmed the model's ability to rank the most relevant ECG features, such as QRS duration, in accordance with clinical intuitions, thereby fostering trust and usability in clinical practice. The findings recognize CardioForest as an extremely dependable and interpretable WCT detection model. Being able to offer accurate predictions and transparency through explainability makes it a valuable tool to help cardiologists make timely and well-informed diagnoses, especially for high-stakes and emergency scenarios.

CardioForest: An Explainable Ensemble Learning Model for Automatic Wide QRS Complex Tachycardia Diagnosis from ECG

TL;DR

This work targets automatic WCT detection from ECG with an explainable ensemble approach. CardioForest, a Random Forest–based framework augmented by gradient-boosting techniques, demonstrates high diagnostic accuracy (mean accuracy ) and robust calibration when tested on the MIMIC-IV-ECG dataset, while SHAP explanations reveal clinically meaningful feature contributions, notably QRS duration. The study rigorously compares CardioForest against GBM, XGBoost, and LightGBM using 10-fold cross-validation, pairwise statistical tests, and stability analyses, underscoring not only performance but also interpretability and reliability in clinical settings. The results suggest that CardioForest can deliver real-time, transparent WCT diagnostics that align with clinical reasoning, with calibrated uncertainty enabling targeted expert review for ambiguous cases. The work also delineates deployment considerations, calibration, and future directions for broader arrhythmia detection and integration with deep learning insights.

Abstract

This study aims to develop and evaluate an ensemble machine learning-based framework for the automatic detection of Wide QRS Complex Tachycardia (WCT) from ECG signals, emphasizing diagnostic accuracy and interpretability using Explainable AI. The proposed system integrates ensemble learning techniques, i.e., an optimized Random Forest known as CardioForest, and models like XGBoost and LightGBM. The models were trained and tested on ECG data from the publicly available MIMIC-IV dataset. The testing was carried out with the assistance of accuracy, balanced accuracy, precision, recall, F1 score, ROC-AUC, and error rate (RMSE, MAE) measures. In addition, SHAP (SHapley Additive exPlanations) was used to ascertain model explainability and clinical relevance. The CardioForest model performed best on all metrics, achieving a test accuracy of 95.19%, a balanced accuracy of 88.76%, a precision of 95.26%, a recall of 78.42%, and an ROC-AUC of 0.8886. SHAP analysis confirmed the model's ability to rank the most relevant ECG features, such as QRS duration, in accordance with clinical intuitions, thereby fostering trust and usability in clinical practice. The findings recognize CardioForest as an extremely dependable and interpretable WCT detection model. Being able to offer accurate predictions and transparency through explainability makes it a valuable tool to help cardiologists make timely and well-informed diagnoses, especially for high-stakes and emergency scenarios.

Paper Structure

This paper contains 42 sections, 9 equations, 23 figures, 11 tables, 5 algorithms.

Figures (23)

  • Figure 1: An overview of the WCT prediction system using the MIMIC-IV ECG database, featuring preprocessing, ensemble machine learning models, cross-validation, and final prediction.
  • Figure 2: Temporal dynamics of ECG features showing rolling statistics (mean, standard deviation, and skewness) for RR interval and QRS duration across the time sequence.
  • Figure 3: Initialization parameters and preprocessing metadata for ECG signal analysis, showing default values (0.00-0.01) for subject identifiers, report fields, filtering parameters, and waveform annotation markers (P-onset, QRS complex). The WCT (Wide Complex Tachycardia) label indicators suggest the beginning of arrhythmia classification preprocessing.
  • Figure 4: Relationship between Principal Component 1 (x-axis) and Principal Component 2 (y-axis). The axis scaling (0-70) indicates the relative variance explained by each component in this dimensionality reduction visualization.
  • Figure 5: This boxplot illustrates the statistical distribution of QRS complex durations across all ECG recordings, showing median values, interquartile ranges, and outliers. The visualization helped validate measurement quality and identify extreme values requiring clinical review before feature selection.
  • ...and 18 more figures