Table of Contents
Fetching ...

Detecting low left ventricular ejection fraction from ECG using an interpretable and scalable predictor-driven framework

Ya Zhou, Tianxiang Hao, Ziyi Cai, Haojie Zhu, Hejun He, Jia Liu, Xiaohan Fan, Jing Yuan

Abstract

Low left ventricular ejection fraction (LEF) frequently remains undetected until progression to symptomatic heart failure, underscoring the need for scalable screening strategies. Although artificial intelligence-enabled electrocardiography (AI-ECG) has shown promise, existing approaches rely solely on end-to-end black-box models with limited interpretability or on tabular systems dependent on commercial ECG measurement algorithms with suboptimal performance. We introduced ECG-based Predictor-Driven LEF (ECGPD-LEF), a structured framework that integrates foundation model-derived diagnostic probabilities with interpretable modeling for detecting LEF from ECG. Trained on the benchmark EchoNext dataset comprising 72,475 ECG-echocardiogram pairs and evaluated in predefined independent internal (n=5,442) and external (n=16,017) cohorts, our framework achieved robust discrimination for moderate LEF (internal AUROC 88.4%, F1 64.5%; external AUROC 86.8%, F1 53.6%), consistently outperforming the official end-to-end baseline provided with the benchmark across demographic and clinical subgroups. Interpretability analyses identified high-impact predictors, including normal ECG, incomplete left bundle branch block, and subendocardial injury in anterolateral leads, driving LEF risk estimation. Notably, these predictors independently enabled zero-shot-like inference without task-specific retraining (internal AUROC 75.3-81.0%; external AUROC 71.6-78.6%), indicating that ventricular dysfunction is intrinsically encoded within structured diagnostic probability representations. This framework reconciles predictive performance with mechanistic transparency, supporting scalable enhancement through additional predictors and seamless integration with existing AI-ECG systems.

Detecting low left ventricular ejection fraction from ECG using an interpretable and scalable predictor-driven framework

Abstract

Low left ventricular ejection fraction (LEF) frequently remains undetected until progression to symptomatic heart failure, underscoring the need for scalable screening strategies. Although artificial intelligence-enabled electrocardiography (AI-ECG) has shown promise, existing approaches rely solely on end-to-end black-box models with limited interpretability or on tabular systems dependent on commercial ECG measurement algorithms with suboptimal performance. We introduced ECG-based Predictor-Driven LEF (ECGPD-LEF), a structured framework that integrates foundation model-derived diagnostic probabilities with interpretable modeling for detecting LEF from ECG. Trained on the benchmark EchoNext dataset comprising 72,475 ECG-echocardiogram pairs and evaluated in predefined independent internal (n=5,442) and external (n=16,017) cohorts, our framework achieved robust discrimination for moderate LEF (internal AUROC 88.4%, F1 64.5%; external AUROC 86.8%, F1 53.6%), consistently outperforming the official end-to-end baseline provided with the benchmark across demographic and clinical subgroups. Interpretability analyses identified high-impact predictors, including normal ECG, incomplete left bundle branch block, and subendocardial injury in anterolateral leads, driving LEF risk estimation. Notably, these predictors independently enabled zero-shot-like inference without task-specific retraining (internal AUROC 75.3-81.0%; external AUROC 71.6-78.6%), indicating that ventricular dysfunction is intrinsically encoded within structured diagnostic probability representations. This framework reconciles predictive performance with mechanistic transparency, supporting scalable enhancement through additional predictors and seamless integration with existing AI-ECG systems.

Paper Structure

This paper contains 40 sections, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Flowchart of model development and evaluation. Model development is based on a traditional AI-ECG interpretation model, where a large unlabeled ECG dataset is used for pre-training and a smaller labeled ECG dataset is used for enhanced post-training. The automatic ECG diagnosis model outputs diagnostic probabilities for each ECG finding and serves as the predictor extractor. Based on the predictor extractor, we developed both a single-predictor approach and a multi-predictor approach. The single-predictor approach enables zero-shot-like inference without further training to detect LEF and can serve as an important indicator. The multi-predictor approach further trains a tabular model using ECG-ECHO pairs and provides both local- and global-level explanations based on SHAP values. Model evaluation is conducted on the EchoNext test set and an independently collected ECHO-Note pairs dataset. Multidimensional model dissection is performed, including overall performance evaluation, model interpretability analysis, and subgroup analyses across diverse populations. LEF, low left ventricular ejection fraction; No-LEF, absence of low left ventricular ejection fraction; SHD, structural heart disease; VHD, valvular heart disease, SHAP, Shapley Additive exPlanations.
  • Figure 2: Model performance across different evaluation settings. The first row shows receiver operating characteristic (ROC) curves (a) and precision--recall (PR) curves (b) for five methods using the full feature set (71 predictors). The second row presents the performance of four models evaluated with an increasing number of predictors (1--71), in terms of AUROC (c), AUPRC (d) and F1 score (e). The gray dashed line indicates the Columbia mini model and serves as a reference baseline.
  • Figure 3: Global-level explanation of ECG model predictions across all predictors. The first row shows (a) the cumulative absolute SHAP contributions of all 71 predictors, expressed as percentages of the total contribution, and (b) a SHAP beeswarm plot for the top 10 predictors ranked by F1 score obtained from the single-predictor method. The second and third rows show (c-h) the relationships between individual predictors (NORM, ILBBB, ISCLA, ANEUR, INJIL and ASMI) and their SHAP values. Point density was estimated using a Gaussian kernel density on the log-transformed values. Each plot includes two vertical reference lines: the first (LEF threshold) indicates the threshold that achieves the optimal F1 score using a single-predictor method, and the second (Diagnosis threshold) indicates the positivity threshold defined by an independent ECG diagnosis model.
  • Figure 4: Local-level explanation of model predictions for a positive and a negative case. The first row shows SHAP waterfall plots for a positive case (a) and a negative case (b), illustrating the top ten ECG predictors contributing to the prediction, with the remaining predictors aggregated as "62 other features". Feature contributions are shown in the log-odds space and sum to the final model output. The second row presents the corresponding predictor--probability relationships for the same cases (c,d). Vertical reference lines indicate positivity thresholds derived from an independent ECG diagnosis model.
  • Figure A.1: Construction of the external test set.(a) Flowchart illustrating the data selection process, including exclusion criteria applied to the MIMIC-IV database to derive the final high-quality testing set. (b) The prompt template used for the Large Language Model (LLM) Clinical Note Extraction Module, designed to extract quantitative ejection fraction(EF) values from unstructured clinical notes into a structured JSON format.
  • ...and 3 more figures