Table of Contents
Fetching ...

UniCoMTE: A Universal Counterfactual Framework for Explaining Time-Series Classifiers on ECG Data

Justin Li, Efe Sencan, Jasper Zheng Duan, Vitus J. Leung, Stephen Tsaur, Ayse K. Coskun

TL;DR

The paper introduces UniCoMTE, a model-agnostic, counterfactual framework for explaining multivariate time-series classifiers, demonstrated on ECG data. It constructs minimal time–lead segment substitutions to flip model predictions, offering concise, actionable explanations that outperform LIME and SHAP in clarity and clinical relevance. The evaluation combines quantitative metrics (comprehensibility and generalizability) with qualitative expert feedback, showing robust, generalizable explanations and positive clinician acceptance. The work also emphasizes data quality and provides a path toward broader applicability across time-series domains and datasets, aiming to increase trust and adoption of deep learning in clinical settings.

Abstract

Machine learning models, particularly deep neural networks, have demonstrated strong performance in classifying complex time series data. However, their black-box nature limits trust and adoption, especially in high-stakes domains such as healthcare. To address this challenge, we introduce UniCoMTE, a model-agnostic framework for generating counterfactual explanations for multivariate time series classifiers. The framework identifies temporal features that most heavily influence a model's prediction by modifying the input sample and assessing its impact on the model's prediction. UniCoMTE is compatible with a wide range of model architectures and operates directly on raw time series inputs. In this study, we evaluate UniCoMTE's explanations on a time series ECG classifier. We quantify explanation quality by comparing our explanations' comprehensibility to comprehensibility of established techniques (LIME and SHAP) and assessing their generalizability to similar samples. Furthermore, clinical utility is assessed through a questionnaire completed by medical experts who review counterfactual explanations presented alongside original ECG samples. Results show that our approach produces concise, stable, and human-aligned explanations that outperform existing methods in both clarity and applicability. By linking model predictions to meaningful signal patterns, the framework advances the interpretability of deep learning models for real-world time series applications.

UniCoMTE: A Universal Counterfactual Framework for Explaining Time-Series Classifiers on ECG Data

TL;DR

The paper introduces UniCoMTE, a model-agnostic, counterfactual framework for explaining multivariate time-series classifiers, demonstrated on ECG data. It constructs minimal time–lead segment substitutions to flip model predictions, offering concise, actionable explanations that outperform LIME and SHAP in clarity and clinical relevance. The evaluation combines quantitative metrics (comprehensibility and generalizability) with qualitative expert feedback, showing robust, generalizable explanations and positive clinician acceptance. The work also emphasizes data quality and provides a path toward broader applicability across time-series domains and datasets, aiming to increase trust and adoption of deep learning in clinical settings.

Abstract

Machine learning models, particularly deep neural networks, have demonstrated strong performance in classifying complex time series data. However, their black-box nature limits trust and adoption, especially in high-stakes domains such as healthcare. To address this challenge, we introduce UniCoMTE, a model-agnostic framework for generating counterfactual explanations for multivariate time series classifiers. The framework identifies temporal features that most heavily influence a model's prediction by modifying the input sample and assessing its impact on the model's prediction. UniCoMTE is compatible with a wide range of model architectures and operates directly on raw time series inputs. In this study, we evaluate UniCoMTE's explanations on a time series ECG classifier. We quantify explanation quality by comparing our explanations' comprehensibility to comprehensibility of established techniques (LIME and SHAP) and assessing their generalizability to similar samples. Furthermore, clinical utility is assessed through a questionnaire completed by medical experts who review counterfactual explanations presented alongside original ECG samples. Results show that our approach produces concise, stable, and human-aligned explanations that outperform existing methods in both clarity and applicability. By linking model predictions to meaningful signal patterns, the framework advances the interpretability of deep learning models for real-world time series applications.

Paper Structure

This paper contains 18 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: SHAP surface plot for first-degree atrioventricular block (1dAVb). The plot shows how individual ECG samples across 12 leads contribute to the classification decision.
  • Figure 2: Example LIME output showing the ten most influential features for a misclassified ECG sample. The bar chart illustrates relative importance weights assigned to signal.
  • Figure 3: Distribution of scores across all expert responses and conditions
  • Figure 5: UniCoMTE counterfactual examples. Original ECGs are shown in black and counterfactuals in red.
  • Figure 6: The deep neural network architecture for ECG classification adopted from Ribeiro et al. ribeiro2020automatic. The model applies convolutional and residual layers to extract temporal patterns from 12-lead ECG signals.
  • ...and 1 more figures