Table of Contents
Fetching ...

SDE-Attention: Latent Attention in SDE-RNNs for Irregularly Sampled Time Series with Missing Data

Yuting Fang, Qouc Le Gia, Flora Salim

TL;DR

The paper addresses robust classification of irregular, partially observed time series by augmenting a fixed SDE–RNN backbone with latent-space attention. It proposes three plug-in modules—static channel attention, time-varying feature attention, and pyramidal multi-scale attention—and systematically evaluates their impact on synthetic periodic data as well as real UCR/UEA benchmarks under varying missingness. Across experiments, latent-space attention consistently improves over the vanilla backbone, with time-varying feature attention using an LSTM (TVF-L) delivering the strongest and most stable gains, especially at higher missing rates. The findings offer practical guidance on choosing attention mechanisms for SDE-based sequence models and highlight the value of explicit channel-level gating in irregular time series tasks.

Abstract

Irregularly sampled time series with substantial missing observations are common in healthcare and sensor networks. We introduce SDE-Attention, a family of SDE-RNNs equipped with channel-level attention on the latent pre-RNN state, including channel recalibration, time-varying feature attention, and pyramidal multi-scale self-attention. We therefore conduct a comparison on a synthetic periodic dataset and real-world benchmarks, under varying missing rate. Latent-space attention consistently improves over a vanilla SDE-RNN. On the univariate UCR datasets, the LSTM-based time-varying feature model SDE-TVF-L achieves the highest average accuracy, raising mean performance by approximately 4, 6, and 10 percentage points over the baseline at 30%, 60% and 90% missingness, respectively (averaged across datasets). On multivariate UEA benchmarks, attention-augmented models again outperform the backbone, with SDE-TVF-L yielding up to a 7% gain in mean accuracy under high missingness. Among the proposed mechanisms, time-varying feature attention is the most robust on univariate datasets. On multivariate datasets, different attention types excel on different tasks, showing that SDE-Attention can be flexibly adapted to the structure of each problem.

SDE-Attention: Latent Attention in SDE-RNNs for Irregularly Sampled Time Series with Missing Data

TL;DR

The paper addresses robust classification of irregular, partially observed time series by augmenting a fixed SDE–RNN backbone with latent-space attention. It proposes three plug-in modules—static channel attention, time-varying feature attention, and pyramidal multi-scale attention—and systematically evaluates their impact on synthetic periodic data as well as real UCR/UEA benchmarks under varying missingness. Across experiments, latent-space attention consistently improves over the vanilla backbone, with time-varying feature attention using an LSTM (TVF-L) delivering the strongest and most stable gains, especially at higher missing rates. The findings offer practical guidance on choosing attention mechanisms for SDE-based sequence models and highlight the value of explicit channel-level gating in irregular time series tasks.

Abstract

Irregularly sampled time series with substantial missing observations are common in healthcare and sensor networks. We introduce SDE-Attention, a family of SDE-RNNs equipped with channel-level attention on the latent pre-RNN state, including channel recalibration, time-varying feature attention, and pyramidal multi-scale self-attention. We therefore conduct a comparison on a synthetic periodic dataset and real-world benchmarks, under varying missing rate. Latent-space attention consistently improves over a vanilla SDE-RNN. On the univariate UCR datasets, the LSTM-based time-varying feature model SDE-TVF-L achieves the highest average accuracy, raising mean performance by approximately 4, 6, and 10 percentage points over the baseline at 30%, 60% and 90% missingness, respectively (averaged across datasets). On multivariate UEA benchmarks, attention-augmented models again outperform the backbone, with SDE-TVF-L yielding up to a 7% gain in mean accuracy under high missingness. Among the proposed mechanisms, time-varying feature attention is the most robust on univariate datasets. On multivariate datasets, different attention types excel on different tasks, showing that SDE-Attention can be flexibly adapted to the structure of each problem.

Paper Structure

This paper contains 19 sections, 9 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: SDE-RNN backbone. Between observation times, the latent state is evolved by an SDE solver; at each observation time $t_i$, a GRU cell updates the state using the reweighted latent $h'_i$. The green bracket indicates the insertion point of latent channel attention; for other attention variants, the same backbone is used with a different attention module at the corresponding latent location.
  • Figure 2: Attention modules used in our SDE-Attention framework. (a) Static channel attention reweights latent dimensions based on a global summary of hidden states. (b) TVF uses an LSTM or Transformer along time to produce time-varying feature-wise gates. (c) The pyramidal module builds multi-scale representations via repeated downsampling, self-attention, and upsampling, and fuses features from all levels.
  • Figure 3: Accuracy as a function of the missing rate on three representative UCR datasets: Wafer, ProximalPhalanxTW, MoteStrain. Each curve compares the SDE--RNN baseline with four hidden level attention variants (SDE-PYR, SDE-TVF-L, SDE-TVF-T and SDE-SCHA).
  • Figure 4: Accuracy as a function of the missing rate on three representative UEA datasets: Epilepsy, NasicMotion, Ering. Each curve compares the SDE--RNN baseline with four hidden level attention variants (SDE-PYR, SDE-TVF-L, SDE-TVF-T and SDE-SCHA).