Table of Contents
Fetching ...

Spatio-temporal Multivariate Time Series Forecast with Chosen Variables

Zibo Liu, Zhe Jiang, Zelin Xu, Tingsong Xiao, Yupu Zhang, Zhengkun Xiao, Haibo Wang, Shigang Chen

TL;DR

This work defines Spatio-Temporal Multivariate Forecast with Chosen Variables (STCV), addressing the practical constraint of deploying only $m$ permanent sensors among $n$ locations while forecasting all locations. The authors propose VIP (Variable-Parameter Iterative Pruning), a framework that jointly optimizes input variable selection and model sparsity through masked variable-parameter pruning, dynamic extrapolation across all variables, and a prioritized replay mechanism to mitigate forgetting. They provide a formal problem definition, complexity analyses, and a detailed architecture that starts from a base STMF model and progressively prunes both inputs and attentional parameters. Experiments on five real-world datasets demonstrate significant gains in forecast accuracy and efficiency, showing how optimized sensor deployment can enhance predictive performance under budget constraints and scale to large spatio-temporal systems.

Abstract

Spatio-Temporal Multivariate time series Forecast (STMF) uses the time series of $n$ spatially distributed variables in a period of recent past to forecast their values in a period of near future. It has important applications in spatio-temporal sensing forecast such as road traffic prediction and air pollution prediction. Recent papers have addressed a practical problem of missing variables in the model input, which arises in the sensing applications where the number $m$ of sensors is far less than the number $n$ of locations to be monitored, due to budget constraints. We observe that the state of the art assumes that the $m$ variables (i.e., locations with sensors) in the model input are pre-determined and the important problem of how to choose the $m$ variables in the input has never been studied. This paper fills the gap by studying a new problem of STMF with chosen variables, which optimally selects $m$-out-of-$n$ variables for the model input in order to maximize the forecast accuracy. We propose a unified framework that jointly performs variable selection and model optimization for both forecast accuracy and model efficiency. It consists of three novel technical components: (1) masked variable-parameter pruning, which progressively prunes less informative variables and attention parameters through quantile-based masking; (2) prioritized variable-parameter replay, which replays low-loss past samples to preserve learned knowledge for model stability; (3) dynamic extrapolation mechanism, which propagates information from variables selected for the input to all other variables via learnable spatial embeddings and adjacency information. Experiments on five real-world datasets show that our work significantly outperforms the state-of-the-art baselines in both accuracy and efficiency, demonstrating the effectiveness of joint variable selection and model optimization.

Spatio-temporal Multivariate Time Series Forecast with Chosen Variables

TL;DR

This work defines Spatio-Temporal Multivariate Forecast with Chosen Variables (STCV), addressing the practical constraint of deploying only permanent sensors among locations while forecasting all locations. The authors propose VIP (Variable-Parameter Iterative Pruning), a framework that jointly optimizes input variable selection and model sparsity through masked variable-parameter pruning, dynamic extrapolation across all variables, and a prioritized replay mechanism to mitigate forgetting. They provide a formal problem definition, complexity analyses, and a detailed architecture that starts from a base STMF model and progressively prunes both inputs and attentional parameters. Experiments on five real-world datasets demonstrate significant gains in forecast accuracy and efficiency, showing how optimized sensor deployment can enhance predictive performance under budget constraints and scale to large spatio-temporal systems.

Abstract

Spatio-Temporal Multivariate time series Forecast (STMF) uses the time series of spatially distributed variables in a period of recent past to forecast their values in a period of near future. It has important applications in spatio-temporal sensing forecast such as road traffic prediction and air pollution prediction. Recent papers have addressed a practical problem of missing variables in the model input, which arises in the sensing applications where the number of sensors is far less than the number of locations to be monitored, due to budget constraints. We observe that the state of the art assumes that the variables (i.e., locations with sensors) in the model input are pre-determined and the important problem of how to choose the variables in the input has never been studied. This paper fills the gap by studying a new problem of STMF with chosen variables, which optimally selects -out-of- variables for the model input in order to maximize the forecast accuracy. We propose a unified framework that jointly performs variable selection and model optimization for both forecast accuracy and model efficiency. It consists of three novel technical components: (1) masked variable-parameter pruning, which progressively prunes less informative variables and attention parameters through quantile-based masking; (2) prioritized variable-parameter replay, which replays low-loss past samples to preserve learned knowledge for model stability; (3) dynamic extrapolation mechanism, which propagates information from variables selected for the input to all other variables via learnable spatial embeddings and adjacency information. Experiments on five real-world datasets show that our work significantly outperforms the state-of-the-art baselines in both accuracy and efficiency, demonstrating the effectiveness of joint variable selection and model optimization.

Paper Structure

This paper contains 24 sections, 18 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Problem comparison: (a) STMF, using $n \times l$ input of the past to forecast $n \times l'$ output of the future, (b) STMF with missing data, where some values in the input are missing, (c) STMF with missing variables, where some variables are missing from the input due to budget constraint, using $m \times l$ input to forecast $n \times l'$ output, (d) STCV, allowing optimal selection of the $m$ variables in the input to maximize forecast accuracy.
  • Figure 2: visualization of traffic volumes at three locations for two days from the PEMS08 dataset.
  • Figure 3: (1) the base STMF model, shown at the top left of the figure. (2) VIP: pruning, shown at the middle left. It learns from the current sample, $S: \left(\mathcal{X}_{N,T}, \mathcal{X}_{N,T'}, \hat{b}, \hat{p}, P_{S}\right)$, and then stores the sample in a buffer, shown at the middle right. (3) VIP: replay, shown at the bottom left. It learns from a replayed sample that is retrieved from the buffer, denoted as $S": \left(\mathcal{X}"_{N,T}, \mathcal{X}"_{N,T'}, \hat{b}", \hat{p}", P_{S"}\right)$, shown at the bottom right. (4) learnable variable and parameter vectors $\hat{b}$, $\hat{p}$, along with their corresponding binary masks $b$, $p$, shown at the top right. (5) adjacency matrix $\hat{A}$, shown at the top right. In (1), (2) and (3) of the figure, we illustrate the temporal embeddings $E_N^{\mathrm{temp}}$, the feature embedding $E_N^{f}$, and the node embeddings $E_N^{node}$ (gray blocks); spatial or temporal attentions (orange block in (1)); parameter-wise masked attentions (orange blocks in (2) and (3)); and dynamic extrapolation (green blocks).
  • Figure 4: Visualization of sensor locations with METR-LA dataset.
  • Figure 5: Comparing our VIP models with the baselines in accuracy-efficiency tradeoff, over dataset PEMS08, with a deployment ratio of $m/n=10\%$.
  • ...and 1 more figures