Table of Contents
Fetching ...

ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation

Vajira Thambawita, Jonas L. Isaksen, Jørgen K. Kanters, Hugo L. Hammer, Pål Halvorsen

TL;DR

The paper addresses the opacity of deep learning models for 12-lead ECG interpretation and introduces ECG-IMN, an Interpretable Mesomorphic Neural Network that functions as a hypernetwork which generates instance-specific, sample-wise weight maps so that the decision follows $z = \mathbf{W} \cdot \mathbf{X} + b$. This strict local linear formulation yields intrinsic explanations via the exact attribution $\mathbf{I}_{attr} = \mathbf{W} \odot \mathbf{X}$, avoiding post-hoc approximation methods. A Transition Decoder maps latent features to high-resolution weight maps, enabling precise localization of abnormalities across leads and time; the system achieves competitive AUROC on the PTB-XL dataset for binary tasks while providing faithful, instance-specific explanations. The work is complemented by interpretable visualization strategies, an interactive HuggingFace Space, and open-source code to support clinical validation and transparent benchmarking, representing a principled step toward white-box cardiac diagnostics.

Abstract

Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the "black-box" nature of these models hinders their clinical deployment. Trust in medical AI requires not just high accuracy but also transparency regarding the specific physiological features driving predictions. Existing explainability methods for ECGs typically rely on post-hoc approximations (e.g., Grad-CAM and SHAP), which can be unstable, computationally expensive, and unfaithful to the model's actual decision-making process. In this work, we propose the ECG-IMN, an Interpretable Mesomorphic Neural Network tailored for high-resolution 12-lead ECG classification. Unlike standard classifiers, the ECG-IMN functions as a hypernetwork: a deep convolutional backbone generates the parameters of a strictly linear model specific to each input sample. This architecture enforces intrinsic interpretability, as the decision logic is mathematically transparent and the generated weights (W) serve as exact, high-resolution feature attribution maps. We introduce a transition decoder that effectively maps latent features to sample-wise weights, enabling precise localization of pathological evidence (e.g., ST-elevation, T-wave inversion) in both time and lead dimensions. We evaluate our approach on the PTB-XL dataset for classification tasks, demonstrating that the ECG-IMN achieves competitive predictive performance (AUROC comparable to black-box baselines) while providing faithful, instance-specific explanations. By explicitly decoupling parameter generation from prediction execution, our framework bridges the gap between deep learning capability and clinical trustworthiness, offering a principled path toward "white-box" cardiac diagnostics.

ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation

TL;DR

The paper addresses the opacity of deep learning models for 12-lead ECG interpretation and introduces ECG-IMN, an Interpretable Mesomorphic Neural Network that functions as a hypernetwork which generates instance-specific, sample-wise weight maps so that the decision follows . This strict local linear formulation yields intrinsic explanations via the exact attribution , avoiding post-hoc approximation methods. A Transition Decoder maps latent features to high-resolution weight maps, enabling precise localization of abnormalities across leads and time; the system achieves competitive AUROC on the PTB-XL dataset for binary tasks while providing faithful, instance-specific explanations. The work is complemented by interpretable visualization strategies, an interactive HuggingFace Space, and open-source code to support clinical validation and transparent benchmarking, representing a principled step toward white-box cardiac diagnostics.

Abstract

Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the "black-box" nature of these models hinders their clinical deployment. Trust in medical AI requires not just high accuracy but also transparency regarding the specific physiological features driving predictions. Existing explainability methods for ECGs typically rely on post-hoc approximations (e.g., Grad-CAM and SHAP), which can be unstable, computationally expensive, and unfaithful to the model's actual decision-making process. In this work, we propose the ECG-IMN, an Interpretable Mesomorphic Neural Network tailored for high-resolution 12-lead ECG classification. Unlike standard classifiers, the ECG-IMN functions as a hypernetwork: a deep convolutional backbone generates the parameters of a strictly linear model specific to each input sample. This architecture enforces intrinsic interpretability, as the decision logic is mathematically transparent and the generated weights (W) serve as exact, high-resolution feature attribution maps. We introduce a transition decoder that effectively maps latent features to sample-wise weights, enabling precise localization of pathological evidence (e.g., ST-elevation, T-wave inversion) in both time and lead dimensions. We evaluate our approach on the PTB-XL dataset for classification tasks, demonstrating that the ECG-IMN achieves competitive predictive performance (AUROC comparable to black-box baselines) while providing faithful, instance-specific explanations. By explicitly decoupling parameter generation from prediction execution, our framework bridges the gap between deep learning capability and clinical trustworthiness, offering a principled path toward "white-box" cardiac diagnostics.
Paper Structure (19 sections, 12 equations, 2 figures, 1 table)

This paper contains 19 sections, 12 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Detailed Architecture of the Interpretable Mesomorphic Neural Network (IMN). The model operates as a hypernetwork where a deep neural network generates the parameters of a local linear model. (1) Parameter Generation Pathway: The input ECG signal $\mathbf{X} \in \mathbb{R}^{12 \times L}$ is processed by a convolutional backbone encoder $f_\theta$, consisting of three stages of convolutions with asymmetric kernels $k(3,15)$ and max-pooling, to produce a compressed latent representation $\mathbf{Z}$. This latent code branches into two generators: the Transition Decoder$g_\phi$, which utilizes upsampling and convolutions to generate high-resolution weight maps $\mathbf{W}$, and the Bias Generator$h_\psi$, which computes scalar biases $b$ via global pooling. (2) Inference Pathway: The final class logits $y$ are computed via a strictly interpretable linear equation $y = \sum(\mathbf{W} \odot \mathbf{X}) + b$, where the generated weights are applied element-wise to the original input via a skip connection. Notation:$L$: input signal length; $K$: number of output classes/tasks; $k(h,w)$: convolution kernel size (height $\times$ width); $C_{in} \to C_{out}$: channel dimensions; BN: Batch Normalization; Up: Nearest-neighbor upsampling; $\odot$: Element-wise multiplication; $\Sigma$: Summation over channel and time dimensions.
  • Figure 2: Comparison of Grad-CAM and IMN-based intrinsic attribution across aggregation scales (100 Hz). The top row shows attribution maps obtained using a modified Grad-CAM, visualizing class-discriminative activations for a fixed window and stride. The middle row presents intrinsic importance maps from the Single-Linear IMN, and the bottom row shows those from the Categorical IMN, all evaluated using identical inputs and window--stride settings. Red intensity denotes positive contribution to the myocardial infarction (MI) prediction, with opacity proportional to contribution magnitude. For visual clarity, only three representative ECG leads are displayed. Predicted MI probabilities from the best-performing checkpoints are $P(\mathrm{MI})=0.556$ for Grad-CAM, $P(\mathrm{MI})=0.945$ for the Single-Linear IMN, and $P(\mathrm{MI})=0.741$ for the Categorical IMN.