Two-Stage Aggregation with Dynamic Local Attention for Irregular Time Series
Xingyu Chen, Xiaochen Zheng, Amina Mollaysa, Manuel Schürch, Ahmed Allam, Michael Krauthammer
TL;DR
This work tackles irregular multivariate time series by introducing TADA, a two-stage aggregation framework combining temporal embedding (TE) and dynamic local attention (DLA) to address time-wise and feature-wise irregularities. TE computes a fixed-dimensional embedding per time step by weighting observed features, while DLA applies feature-specific adaptive time windows to harmonize irregular sampling, feeding a Hierarchical MLP Mixer that captures multi-scale patterns. The approach yields state-of-the-art results on PhysioNet, MIMIC IV, and Human Activity datasets, and offers interpretable analyses showing how DLA adapts to different feature sampling rates. By avoiding heavy interpolation and modeling at multiple scales, TADA provides a robust, scalable solution with practical implications for real-world healthcare and environmental time-series applications.
Abstract
Irregular multivariate time series data is characterized by varying time intervals between consecutive observations of measured variables/signals (i.e., features) and varying sampling rates (i.e., recordings/measurement) across these features. Modeling time series while taking into account these irregularities is still a challenging task for machine learning methods. Here, we introduce TADA, a Two-stageAggregation process with Dynamic local Attention to harmonize time-wise and feature-wise irregularities in multivariate time series. In the first stage, the irregular time series undergoes temporal embedding (TE) using all available features at each time step. This process preserves the contribution of each available feature and generates a fixed-dimensional representation per time step. The second stage introduces a dynamic local attention (DLA) mechanism with adaptive window sizes. DLA aggregates time recordings using feature-specific windows to harmonize irregular time intervals capturing feature-specific sampling rates. Then hierarchical MLP mixer layers process the output of DLA through multiscale patching to leverage information at various scales for the downstream tasks. TADA outperforms state-of-the-art methods on three real-world datasets, including the latest MIMIC IV dataset, and highlights its effectiveness in handling irregular multivariate time series and its potential for various real-world applications.
