Table of Contents
Fetching ...

Two-Stage Aggregation with Dynamic Local Attention for Irregular Time Series

Xingyu Chen, Xiaochen Zheng, Amina Mollaysa, Manuel Schürch, Ahmed Allam, Michael Krauthammer

TL;DR

This work tackles irregular multivariate time series by introducing TADA, a two-stage aggregation framework combining temporal embedding (TE) and dynamic local attention (DLA) to address time-wise and feature-wise irregularities. TE computes a fixed-dimensional embedding per time step by weighting observed features, while DLA applies feature-specific adaptive time windows to harmonize irregular sampling, feeding a Hierarchical MLP Mixer that captures multi-scale patterns. The approach yields state-of-the-art results on PhysioNet, MIMIC IV, and Human Activity datasets, and offers interpretable analyses showing how DLA adapts to different feature sampling rates. By avoiding heavy interpolation and modeling at multiple scales, TADA provides a robust, scalable solution with practical implications for real-world healthcare and environmental time-series applications.

Abstract

Irregular multivariate time series data is characterized by varying time intervals between consecutive observations of measured variables/signals (i.e., features) and varying sampling rates (i.e., recordings/measurement) across these features. Modeling time series while taking into account these irregularities is still a challenging task for machine learning methods. Here, we introduce TADA, a Two-stageAggregation process with Dynamic local Attention to harmonize time-wise and feature-wise irregularities in multivariate time series. In the first stage, the irregular time series undergoes temporal embedding (TE) using all available features at each time step. This process preserves the contribution of each available feature and generates a fixed-dimensional representation per time step. The second stage introduces a dynamic local attention (DLA) mechanism with adaptive window sizes. DLA aggregates time recordings using feature-specific windows to harmonize irregular time intervals capturing feature-specific sampling rates. Then hierarchical MLP mixer layers process the output of DLA through multiscale patching to leverage information at various scales for the downstream tasks. TADA outperforms state-of-the-art methods on three real-world datasets, including the latest MIMIC IV dataset, and highlights its effectiveness in handling irregular multivariate time series and its potential for various real-world applications.

Two-Stage Aggregation with Dynamic Local Attention for Irregular Time Series

TL;DR

This work tackles irregular multivariate time series by introducing TADA, a two-stage aggregation framework combining temporal embedding (TE) and dynamic local attention (DLA) to address time-wise and feature-wise irregularities. TE computes a fixed-dimensional embedding per time step by weighting observed features, while DLA applies feature-specific adaptive time windows to harmonize irregular sampling, feeding a Hierarchical MLP Mixer that captures multi-scale patterns. The approach yields state-of-the-art results on PhysioNet, MIMIC IV, and Human Activity datasets, and offers interpretable analyses showing how DLA adapts to different feature sampling rates. By avoiding heavy interpolation and modeling at multiple scales, TADA provides a robust, scalable solution with practical implications for real-world healthcare and environmental time-series applications.

Abstract

Irregular multivariate time series data is characterized by varying time intervals between consecutive observations of measured variables/signals (i.e., features) and varying sampling rates (i.e., recordings/measurement) across these features. Modeling time series while taking into account these irregularities is still a challenging task for machine learning methods. Here, we introduce TADA, a Two-stageAggregation process with Dynamic local Attention to harmonize time-wise and feature-wise irregularities in multivariate time series. In the first stage, the irregular time series undergoes temporal embedding (TE) using all available features at each time step. This process preserves the contribution of each available feature and generates a fixed-dimensional representation per time step. The second stage introduces a dynamic local attention (DLA) mechanism with adaptive window sizes. DLA aggregates time recordings using feature-specific windows to harmonize irregular time intervals capturing feature-specific sampling rates. Then hierarchical MLP mixer layers process the output of DLA through multiscale patching to leverage information at various scales for the downstream tasks. TADA outperforms state-of-the-art methods on three real-world datasets, including the latest MIMIC IV dataset, and highlights its effectiveness in handling irregular multivariate time series and its potential for various real-world applications.
Paper Structure (23 sections, 15 equations, 7 figures, 6 tables)

This paper contains 23 sections, 15 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Illustration of selected features in a sample time series from MIMIC IV dataset.
  • Figure 2: Data Structure and Temporal Embedding Module
  • Figure 3: Overview of TADA architecture. (a) shows the temporal embedding (TE) at a specific time step over all features. (b) shows the dynamic localized attention on a specific feature across all observed time steps. (c) shows the MLP mixers accepting learned representations from DLA as inputs. The activation and normalization layer is omitted for simplicity. The figure shows a segmentation of the representations using a patch size of two.
  • Figure 4: Evaluation on hyper-parameter influence. The model performances corresponding to (a), (b) different combinations of patch sizes and number of attention queries ($L$) , to (c) (d) different combinations of patch sizes and MLP mixer layers for the Human Activity and PhysioNet datasets.
  • Figure 5: DLA visualization on feature (a) Diastolic Blood Pressure, (b) Temperature, (c) SpO$_2$ and (d) Glucose in MIMIC IV dataset. R denotes the learned window size in their respective attention scores (after normalization). Top row: Attention score of DLA. Middle row: Original values of signals. Bottom row: Weighted queries after DLA.
  • ...and 2 more figures