Sparse learned kernels for interpretable and efficient medical time series processing

Sully F. Chen; Zhicheng Guo; Cheng Ding; Xiao Hu; Cynthia Rudin

Sparse learned kernels for interpretable and efficient medical time series processing

Sully F. Chen, Zhicheng Guo, Cheng Ding, Xiao Hu, Cynthia Rudin

TL;DR

This work introduces Sparse Mixture of Learned Kernels (SMoLK), a lightweight, single-layer, sparse neural architecture for medical time-series processing that delivers competitive performance with orders of magnitude fewer parameters. By learning a bank of convolutional kernels and employing weight absorption and correlated kernel pruning, SMoLK achieves efficient, real-time segmentation of PPG artifacts and robust single-lead ECG atrial fibrillation detection while offering inherently interpretable kernel-level contributions. The approach demonstrates strong generalization to out-of-distribution data and maintains performance under quantitative pruning and quantization, making it suitable for low-power wearables. The results suggest that, for certain medical signal tasks, simple, interpretable models can rival deep networks without sacrificing accuracy, enabling deployment on resource-constrained devices and facilitating transparent clinical decision support.

Abstract

Rapid, reliable, and accurate interpretation of medical time-series signals is crucial for high-stakes clinical decision-making. Deep learning methods offered unprecedented performance in medical signal processing but at a cost: they were compute-intensive and lacked interpretability. We propose Sparse Mixture of Learned Kernels (SMoLK), an interpretable architecture for medical time series processing. SMoLK learns a set of lightweight flexible kernels that form a single-layer sparse neural network, providing not only interpretability, but also efficiency, robustness, and generalization to unseen data distributions. We introduce a parameter reduction techniques to reduce the size of SMoLK's networks while maintaining performance. We test SMoLK on two important tasks common to many consumer wearables: photoplethysmography (PPG) artifact detection and atrial fibrillation detection from single-lead electrocardiograms (ECGs). We find that SMoLK matches the performance of models orders of magnitude larger. It is particularly suited for real-time applications using low-power devices, and its interpretability benefits high-stakes situations.

Sparse learned kernels for interpretable and efficient medical time series processing

TL;DR

Abstract

Paper Structure (50 sections, 11 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 50 sections, 11 equations, 9 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Classical Statistical and Machine Learning Techniques
Deep Learning Approaches
Model Architecture
Overview
Classification
Overview
Interpretability
Weight Absorption
Correlated Kernel Pruning
Methods
Datasets and Preprocessing
PPG Segmentation
Training
...and 35 more sections

Figures (9)

Figure 1: The PPG Processing Pipeline. The processing pipeline for the PPG signal quality segmentation task during training and during inference. First, a PPG signal is normalized to a unit normal by subtracting the mean and dividing by the standard deviation. Next, a set of convolutions is applied. Our smallest model is lightweight enough that all of the convolved signals are displayed in this figure. The convolved signals are ceiled, weighted, and summed, and a sigmoid is applied. Finally, the output is smoothed and thresholded to segment the signal. After training, similar kernels are ablated and kernel weights are absorbed to reduce parameter count.
Figure 2: The ECG Processing Pipeline.a. The pipeline for processing a single lead ECG. First, a set of learned convolutions is applied to the ECG to produce several feature maps. Then, feature-wise means are computed, which are used as inputs to a linear model. Global frequency information, obtained from a power spectrum, is also used as inputs for the linear model to yield a logit. b. The reverse pipeline for interpreting the prediction in subfigure a from the learned kernel model. First, feature maps are generated by the learned kernel convolutions, in the same manner as a. However, instead of computing the mean value of this feature map, the feature maps are multiplied element-wise with their corresponding class weights. The power spectrum is not used in the interpretation. Lastly, a transposed convolution is applied to compute the importance of each part of the input signal. This method is almost an exact inverse of the forward procedure, allowing for a principled assignment of importance to the input signal that directly correlates to the output class logit. Notably, this method correctly labels the increased PR-interval as the most important feature in the classification of this ECG as $1^{st}$-degree heart block, indicating that our model has learned important features rather than spurious correlations.
Figure 3: Kernel Statistics. The computed "kernel importance" of each kind of kernel in each model size (top row) and the empirical average activation of each kernel group observed over the clean and artifact parts of the dataset (bottom row). Significant differences in computed "kernel importances" are reflected in the empirical output values of those kernel groups over the test set. Significance was computed via a two-tailed independent samples t-test with Bonferroni correction. All differences are significant between the empirical kernel output due to the extremely large sample size. All box-and-whisker plots are constructed via the median value as the central line, the interquartile range (IQR) as the box, and the whiskers denoting the minimum and maximum value of the distribution. Outliers are defined as points that lie outside of $\pm 1.5\times \text{IQR}$ and were excluded from the plot for clarity, though all points were included in statistical analysis.
Figure 4: The contribution of each kernel group to the overall output signal. Noticeably, the "long" kernels have a positive signal even when convolved over clean segments, the "moderate" kernels have a signal close to zero when convolved over clean segments and a positive value over the artifact, and the "small" kernels have negative values except when convolved over the artifact.
Figure 5: Selected ECG Interpretations. Various examples of $1^\circ$ atrioventricular block from the Zheng et al. dataset, along with the corresponding contribution maps as computed by Algorithm \ref{['alg:contribution']}. Redder colors indicate greater contribution to the $1^\circ$ atrioventricular block, while bluer colors indicate lesser contribution.
...and 4 more figures

Sparse learned kernels for interpretable and efficient medical time series processing

TL;DR

Abstract

Sparse learned kernels for interpretable and efficient medical time series processing

Authors

TL;DR

Abstract

Table of Contents

Figures (9)