Table of Contents
Fetching ...

S4M: S4 for multivariate time series forecasting with Missing values

Jing Peng, Meiqi Yang, Qiong Zhang, Xiaoxiao Li

TL;DR

S4M addresses the challenge of forecasting multivariate time series with block missing data by integrating missing-data handling into an end-to-end S4-based framework. It introduces ATPM to learn robust historical patterns via a prototype bank and MDS-S4 to forecast using dual streams that jointly process latent representations and missingness masks. The approach demonstrates consistent state-of-the-art performance across four real-world datasets under both time-point and variable missing patterns, with ablations confirming the value of the masking pathway and prototype-based representations. The methods offer scalable, efficient forecasting in practical settings where data are incomplete, highlighting a promising direction for end-to-end handling of missing values in time-series modeling.

Abstract

Multivariate time series data play a pivotal role in a wide range of real-world applications. However, the presence of block missing data introduces significant challenges, often compromising the performance of predictive models. Traditional two-step approaches, which first impute missing values and then perform forecasting, are prone to error accumulation, particularly in complex multivariate settings characterized by high missing ratios and intricate dependency structures. In this work, we introduce S4M, an end-to-end time series forecasting framework that seamlessly integrates missing data handling into the Structured State Space Sequence (S4) model architecture. Unlike conventional methods that treat imputation as a separate preprocessing step, S4M leverages the latent space of S4 models to directly recognize and represent missing data patterns, thereby more effectively capturing the underlying temporal and multivariate dependencies. Our framework comprises two key components: the Adaptive Temporal Prototype Mapper (ATPM) and the Missing-Aware Dual Stream S4 (MDS-S4). The ATPM employs a prototype bank to derive robust and informative representations from historical data patterns, while the MDS-S4 processes these representations alongside missingness masks as dual input streams to enable accurate forecasting. Through extensive empirical evaluations on diverse real-world datasets, we demonstrate that S4M consistently achieves state-of-the-art performance. These results underscore the efficacy of our integrated approach in handling missing data, showcasing its robustness and superiority over traditional imputation-based methods. Our findings highlight the potential of S4M to advance reliable time series forecasting in practical applications, offering a promising direction for future research and deployment. Code is available at https://github.com/WINTERWEEL/S4M.git.

S4M: S4 for multivariate time series forecasting with Missing values

TL;DR

S4M addresses the challenge of forecasting multivariate time series with block missing data by integrating missing-data handling into an end-to-end S4-based framework. It introduces ATPM to learn robust historical patterns via a prototype bank and MDS-S4 to forecast using dual streams that jointly process latent representations and missingness masks. The approach demonstrates consistent state-of-the-art performance across four real-world datasets under both time-point and variable missing patterns, with ablations confirming the value of the masking pathway and prototype-based representations. The methods offer scalable, efficient forecasting in practical settings where data are incomplete, highlighting a promising direction for end-to-end handling of missing values in time-series modeling.

Abstract

Multivariate time series data play a pivotal role in a wide range of real-world applications. However, the presence of block missing data introduces significant challenges, often compromising the performance of predictive models. Traditional two-step approaches, which first impute missing values and then perform forecasting, are prone to error accumulation, particularly in complex multivariate settings characterized by high missing ratios and intricate dependency structures. In this work, we introduce S4M, an end-to-end time series forecasting framework that seamlessly integrates missing data handling into the Structured State Space Sequence (S4) model architecture. Unlike conventional methods that treat imputation as a separate preprocessing step, S4M leverages the latent space of S4 models to directly recognize and represent missing data patterns, thereby more effectively capturing the underlying temporal and multivariate dependencies. Our framework comprises two key components: the Adaptive Temporal Prototype Mapper (ATPM) and the Missing-Aware Dual Stream S4 (MDS-S4). The ATPM employs a prototype bank to derive robust and informative representations from historical data patterns, while the MDS-S4 processes these representations alongside missingness masks as dual input streams to enable accurate forecasting. Through extensive empirical evaluations on diverse real-world datasets, we demonstrate that S4M consistently achieves state-of-the-art performance. These results underscore the efficacy of our integrated approach in handling missing data, showcasing its robustness and superiority over traditional imputation-based methods. Our findings highlight the potential of S4M to advance reliable time series forecasting in practical applications, offering a promising direction for future research and deployment. Code is available at https://github.com/WINTERWEEL/S4M.git.

Paper Structure

This paper contains 38 sections, 9 equations, 10 figures, 25 tables, 4 algorithms.

Figures (10)

  • Figure 1: Comparison of prediction MSE versus training time for various methods on the Electricity dataset. Each method is represented by a dot, with size scaled according to its memory footprint. Lower values for MSE, training time, and memory indicate better performance. Our $\texttt{S4M}$ method demonstrates superior performance across all metrics.
  • Figure 2: Illustration of our end-to-end prediction method $\texttt{S4M}$. Our method consists of two modules. The first $\texttt{ATPM}$ module uses historical data patterns to learn robust and informative representations for the current input time sequence. Specifically, we extract the local statistics ${\bm{z}}_{t-s:t}$ of the time series at time point $t$ based on raw values ${\bm{x}}_{t-s:t}$. These statistics are then fed into the query encoder ${\color{ForestGreen}{E_q}}$ to obtain ${\bm{q}}_t$, which queries the prototype bank to retrieve the prototype $\hat{{\bm{q}}}_t$. Both ${\bm{q}}_t$ and$\hat{{\bm{q}}}_t$ are subsequently fed into a linear layer to produce the final representation ${\bm{o}}_t$. Additionally, the prototype encoder ${\color{RoyalBlue}{E_p}}$ generates the prototype ${\bm{p}}_t$ for bank updating. In the second module $\texttt{MDS-S4}$, we model the representation ${\bm{o}}_t$ and the mask ${\bm{m}}_t$ using S4 to generate the forecast ${\bm{y}}_t$.
  • Figure 3: The performance of different methods on four datasets under time point missing scenario when the missing ratio $r$ varies from $0.03$ to $0.24$.
  • Figure 4: The performance of different methods on four datasets under variable missing scenario when the missing ratio $r$ varies from $0.03$ to $0.24$.
  • Figure 5: Illustration of block missing patterns: Time Point Missing (Left) and Variable Missing (Right). Each column represents a variable in the time series, and each row corresponds to observations at a specific time point. Red blocks indicate missing observations, while white blocks represent observed data. Missing values are consecutive in both patterns. For time point missing, all variables are missing at a given time point. For variable missing, some variables may remain observed at the same time point.
  • ...and 5 more figures