Table of Contents
Fetching ...

Is Precise Recovery Necessary? A Task-Oriented Imputation Approach for Time Series Forecasting on Variable Subset

Qi Hao, Runchang Liang, Yue Gao, Hao Dong, Wei Fan, Lu Jiang, Pengyang Wang

TL;DR

Task-Oriented Imputation for VSF (TOI-VSF), a novel framework that incorporates a self-supervised imputation module, agnostic to the forecasting model, designed to fill in missing variables while preserving the vital characteristics and temporal patterns of time series data.

Abstract

Variable Subset Forecasting (VSF) refers to a unique scenario in multivariate time series forecasting, where available variables in the inference phase are only a subset of the variables in the training phase. VSF presents significant challenges as the entire time series may be missing, and neither inter- nor intra-variable correlations persist. Such conditions impede the effectiveness of traditional imputation methods, primarily focusing on filling in individual missing data points. Inspired by the principle of feature engineering that not all variables contribute positively to forecasting, we propose Task-Oriented Imputation for VSF (TOI-VSF), a novel framework shifts the focus from accurate data recovery to directly support the downstream forecasting task. TOI-VSF incorporates a self-supervised imputation module, agnostic to the forecasting model, designed to fill in missing variables while preserving the vital characteristics and temporal patterns of time series data. Additionally, we implement a joint learning strategy for imputation and forecasting, ensuring that the imputation process is directly aligned with and beneficial to the forecasting objective. Extensive experiments across four datasets demonstrate the superiority of TOI-VSF, outperforming baseline methods by $15\%$ on average.

Is Precise Recovery Necessary? A Task-Oriented Imputation Approach for Time Series Forecasting on Variable Subset

TL;DR

Task-Oriented Imputation for VSF (TOI-VSF), a novel framework that incorporates a self-supervised imputation module, agnostic to the forecasting model, designed to fill in missing variables while preserving the vital characteristics and temporal patterns of time series data.

Abstract

Variable Subset Forecasting (VSF) refers to a unique scenario in multivariate time series forecasting, where available variables in the inference phase are only a subset of the variables in the training phase. VSF presents significant challenges as the entire time series may be missing, and neither inter- nor intra-variable correlations persist. Such conditions impede the effectiveness of traditional imputation methods, primarily focusing on filling in individual missing data points. Inspired by the principle of feature engineering that not all variables contribute positively to forecasting, we propose Task-Oriented Imputation for VSF (TOI-VSF), a novel framework shifts the focus from accurate data recovery to directly support the downstream forecasting task. TOI-VSF incorporates a self-supervised imputation module, agnostic to the forecasting model, designed to fill in missing variables while preserving the vital characteristics and temporal patterns of time series data. Additionally, we implement a joint learning strategy for imputation and forecasting, ensuring that the imputation process is directly aligned with and beneficial to the forecasting objective. Extensive experiments across four datasets demonstrate the superiority of TOI-VSF, outperforming baseline methods by on average.

Paper Structure

This paper contains 23 sections, 12 equations, 18 figures, 3 tables.

Figures (18)

  • Figure 1: Variable Subset Forecast Problem: The left and the right figures show the training and inference phase respectively. The orange part represents all data in the training process. The grey part represents the missing variables. And the green part represents available subset variables.
  • Figure 2: An example of the inference phase on the ECG5000 dataset 10.1145/3534678.3539394 utilizing the forecasting backbone MTGNN wu2020connecting. During the inference phase, only a small variable subset is available, while others are missing. The objective is to obtain a precise prediction of the available variable subset. To simplify, we choose two variables as the example: one variable as the available variable subset (the black solid line) and one missing variable (the grey dotted line), which is negatively correlated to the available variable subset. To handle the missing variable, we apply different imputation strategies, including Gaussian noise-filling (the green dotted line) and traditional imputation method FDW 10.1145/3534678.3539394 (the orange dotted line). The forecasting performance in an ideal scenario, where no data is missing, is illustrated by the grey solid line. The Gaussian imputation shows a better forecasting performance than the ideal scenario, suggesting the presence of noise or redundant information in the original data. This implies that precise recovery of the missing data may not be essential for the VSF task and may even weaken the forecasting performance.
  • Figure 3: Framework Overview. Left: The training phase. For multivariate time series $\{ \mathbf{x}^{i}_{t-L:t} \}_{i=1}^{N}$, certain variables $N-S$ are randomly masked to obtain a variable subset $\{ \mathbf{u}^{i}_{t-L:t} \}_{i=1}^{S}$. The subset $\Psi_S$ is then fed to the self-supervised learning model to obtain a reconstructed time series $\{ \mathbf{\tilde{u}}^{i}_{t-L:t} \}_{i=1}^{N}$. Subsequently, the reconstructed time series is used for the forecasting task, ultimately yielding the prediction results $\{ \mathbf{\hat{x}}^{i}_{t:t+Q} \}_{i=1}^{N}$. Right: The self-supervised variable imputation model. The reconstructed time series $\{ \mathbf{\tilde{u}}^{i}_{t-L:t} \}_{i=1}^{N}$ is obtained by feeding the input variable subset $\{ \mathbf{u}^{i}_{t-L:t} \}_{i=1}^{S}$ into the model.
  • Figure 4: Inference Phase. Different from the training phase, the test data only contains a variable subset $\{ \mathbf{u}^{i}_{t-L:t} \}_{i=1}^{S}$. By the trained model, we can generate the new time series $\{ \mathbf{\tilde{u}}^{i}_{t-L:t} \}_{i=1}^{S}$ and get more accurate predictions $\{ \mathbf{\hat{x}}^{i}_{t:t+Q} \}_{i=1}^{S}$. Notably, the performance of VSF is only evaluated on the variable subset $\Psi_S$.
  • Figure 5: Different Weights of Self-supervised Learning Module Loss on METR-LA. The horizontal axis is $\alpha$, which means the proportion of self-supervised learning loss to the overall loss.
  • ...and 13 more figures