Table of Contents
Fetching ...

A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

Dingyi Nie, Yixing Wu, C. -C. Jay Kuo

TL;DR

This work proposes a simpler yet effective alternative to temporal interpolation or complex architectures to handle irregularities, extracting time-agnostic summary statistics to eliminate the temporal axis, and identifies scenarios where missing patterns themselves encode predictive signals, as in sepsis prediction.

Abstract

Irregular multivariate time series with missing values present significant challenges for predictive modeling in domains such as healthcare. While deep learning approaches often focus on temporal interpolation or complex architectures to handle irregularities, we propose a simpler yet effective alternative: extracting time-agnostic summary statistics to eliminate the temporal axis. Our method computes four key features per variable-mean and standard deviation of observed values, as well as the mean and variability of changes between consecutive observations to create a fixed-dimensional representation. These features are then utilized with standard classifiers, such as logistic regression and XGBoost. Evaluated on four biomedical datasets (PhysioNet Challenge 2012, 2019, PAMAP2, and MIMIC-III), our approach achieves state-of-the-art performance, surpassing recent transformer and graph-based models by 0.5-1.7% in AUROC/AUPRC and 1.1-1.7% in accuracy/F1-score, while reducing computational complexity. Ablation studies demonstrate that feature extraction-not classifier choice-drives performance gains, and our summary statistics outperform raw/imputed input in most benchmarks. In particular, we identify scenarios where missing patterns themselves encode predictive signals, as in sepsis prediction (PhysioNet, 2019), where missing indicators alone can achieve 94.2% AUROC with XGBoost, only 1.6% lower than using original raw data as input. Our results challenge the necessity of complex temporal modeling when task objectives permit time-agnostic representations, providing an efficient and interpretable solution for irregular time series classification.

A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

TL;DR

This work proposes a simpler yet effective alternative to temporal interpolation or complex architectures to handle irregularities, extracting time-agnostic summary statistics to eliminate the temporal axis, and identifies scenarios where missing patterns themselves encode predictive signals, as in sepsis prediction.

Abstract

Irregular multivariate time series with missing values present significant challenges for predictive modeling in domains such as healthcare. While deep learning approaches often focus on temporal interpolation or complex architectures to handle irregularities, we propose a simpler yet effective alternative: extracting time-agnostic summary statistics to eliminate the temporal axis. Our method computes four key features per variable-mean and standard deviation of observed values, as well as the mean and variability of changes between consecutive observations to create a fixed-dimensional representation. These features are then utilized with standard classifiers, such as logistic regression and XGBoost. Evaluated on four biomedical datasets (PhysioNet Challenge 2012, 2019, PAMAP2, and MIMIC-III), our approach achieves state-of-the-art performance, surpassing recent transformer and graph-based models by 0.5-1.7% in AUROC/AUPRC and 1.1-1.7% in accuracy/F1-score, while reducing computational complexity. Ablation studies demonstrate that feature extraction-not classifier choice-drives performance gains, and our summary statistics outperform raw/imputed input in most benchmarks. In particular, we identify scenarios where missing patterns themselves encode predictive signals, as in sepsis prediction (PhysioNet, 2019), where missing indicators alone can achieve 94.2% AUROC with XGBoost, only 1.6% lower than using original raw data as input. Our results challenge the necessity of complex temporal modeling when task objectives permit time-agnostic representations, providing an efficient and interpretable solution for irregular time series classification.
Paper Structure (16 sections, 6 equations, 1 figure, 5 tables)

This paper contains 16 sections, 6 equations, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Feature importance analysis on the P19 dataset using XGBoost. The importance metric is Total Gain. For each of the four feature types—$\mu^{(0)}$ (Mean), $\sigma^{(0)}$ (STD), $\mu^{(1)}$ (Mean Change), and $\sigma^{(1)}$ (Change Variability)—we summed the gains of all corresponding features, averaged over 5-fold cross-validation.