SDE-Attention: Latent Attention in SDE-RNNs for Irregularly Sampled Time Series with Missing Data
Yuting Fang, Qouc Le Gia, Flora Salim
TL;DR
The paper addresses robust classification of irregular, partially observed time series by augmenting a fixed SDE–RNN backbone with latent-space attention. It proposes three plug-in modules—static channel attention, time-varying feature attention, and pyramidal multi-scale attention—and systematically evaluates their impact on synthetic periodic data as well as real UCR/UEA benchmarks under varying missingness. Across experiments, latent-space attention consistently improves over the vanilla backbone, with time-varying feature attention using an LSTM (TVF-L) delivering the strongest and most stable gains, especially at higher missing rates. The findings offer practical guidance on choosing attention mechanisms for SDE-based sequence models and highlight the value of explicit channel-level gating in irregular time series tasks.
Abstract
Irregularly sampled time series with substantial missing observations are common in healthcare and sensor networks. We introduce SDE-Attention, a family of SDE-RNNs equipped with channel-level attention on the latent pre-RNN state, including channel recalibration, time-varying feature attention, and pyramidal multi-scale self-attention. We therefore conduct a comparison on a synthetic periodic dataset and real-world benchmarks, under varying missing rate. Latent-space attention consistently improves over a vanilla SDE-RNN. On the univariate UCR datasets, the LSTM-based time-varying feature model SDE-TVF-L achieves the highest average accuracy, raising mean performance by approximately 4, 6, and 10 percentage points over the baseline at 30%, 60% and 90% missingness, respectively (averaged across datasets). On multivariate UEA benchmarks, attention-augmented models again outperform the backbone, with SDE-TVF-L yielding up to a 7% gain in mean accuracy under high missingness. Among the proposed mechanisms, time-varying feature attention is the most robust on univariate datasets. On multivariate datasets, different attention types excel on different tasks, showing that SDE-Attention can be flexibly adapted to the structure of each problem.
