Table of Contents
Fetching ...

Partial-Multivariate Model for Forecasting

Jaehoon Lee, Hankook Lee, Sungik Choi, Sungjun Cho, Moontae Lee

TL;DR

The paper tackles time-series forecasting with multiple variables by introducing Partial-Multivariate models, a middle ground between univariate and complete-multivariate approaches. It proposes PMformer, a Transformer-based model that learns dependencies only within randomly sampled feature subsets of size $S$ from a full set of $D$ features, using a shared architecture across all subsets. The approach includes training algorithms based on random sampling or partitioning, a simple inference technique that averages over multiple random partitions, and a PAC-Bayes–inspired theoretical analysis explaining why partial-multivariate modeling can generalize well. Empirically, PMformer outperforms a broad set of baselines across seven real-world datasets and exhibits efficiency and robustness to missing features, suggesting practical benefits for scalable, heterogeneous time-series forecasting. The work also discusses limitations and potential broader impacts for foundation models in time-series settings.

Abstract

When solving forecasting problems including multiple time-series features, existing approaches often fall into two extreme categories, depending on whether to utilize inter-feature information: univariate and complete-multivariate models. Unlike univariate cases which ignore the information, complete-multivariate models compute relationships among a complete set of features. However, despite the potential advantage of leveraging the additional information, complete-multivariate models sometimes underperform univariate ones. Therefore, our research aims to explore a middle ground between these two by introducing what we term Partial-Multivariate models where a neural network captures only partial relationships, that is, dependencies within subsets of all features. To this end, we propose PMformer, a Transformer-based partial-multivariate model, with its training algorithm. We demonstrate that PMformer outperforms various univariate and complete-multivariate models, providing a theoretical rationale and empirical analysis for its superiority. Additionally, by proposing an inference technique for PMformer, the forecasting accuracy is further enhanced. Finally, we highlight other advantages of PMformer: efficiency and robustness under missing features.

Partial-Multivariate Model for Forecasting

TL;DR

The paper tackles time-series forecasting with multiple variables by introducing Partial-Multivariate models, a middle ground between univariate and complete-multivariate approaches. It proposes PMformer, a Transformer-based model that learns dependencies only within randomly sampled feature subsets of size from a full set of features, using a shared architecture across all subsets. The approach includes training algorithms based on random sampling or partitioning, a simple inference technique that averages over multiple random partitions, and a PAC-Bayes–inspired theoretical analysis explaining why partial-multivariate modeling can generalize well. Empirically, PMformer outperforms a broad set of baselines across seven real-world datasets and exhibits efficiency and robustness to missing features, suggesting practical benefits for scalable, heterogeneous time-series forecasting. The work also discusses limitations and potential broader impacts for foundation models in time-series settings.

Abstract

When solving forecasting problems including multiple time-series features, existing approaches often fall into two extreme categories, depending on whether to utilize inter-feature information: univariate and complete-multivariate models. Unlike univariate cases which ignore the information, complete-multivariate models compute relationships among a complete set of features. However, despite the potential advantage of leveraging the additional information, complete-multivariate models sometimes underperform univariate ones. Therefore, our research aims to explore a middle ground between these two by introducing what we term Partial-Multivariate models where a neural network captures only partial relationships, that is, dependencies within subsets of all features. To this end, we propose PMformer, a Transformer-based partial-multivariate model, with its training algorithm. We demonstrate that PMformer outperforms various univariate and complete-multivariate models, providing a theoretical rationale and empirical analysis for its superiority. Additionally, by proposing an inference technique for PMformer, the forecasting accuracy is further enhanced. Finally, we highlight other advantages of PMformer: efficiency and robustness under missing features.
Paper Structure (28 sections, 4 theorems, 12 equations, 14 figures, 11 tables, 2 algorithms)

This paper contains 28 sections, 4 theorems, 12 equations, 14 figures, 11 tables, 2 algorithms.

Key Result

Theorem 1

Under some assumptions, with probability at least $1-\delta$ over the selection of the sample $\mathcal{T}$, we have the following for generalized loss $l(\mathbf{Q})$ under posterior distributions $\mathbf{Q}$. where $H(\mathbf{Q})$ is the entropy of $\mathbf{Q}$, (i.e., $H(\mathbf{Q}) = E_{h\sim \mathbf{Q}}[-\log \mathbf{Q}(h)]$) and $C$ is a constant.

Figures (14)

  • Figure 1: Visualization of three types of models. While the complete-multivariate model processes a complete set of features simultaneously taking into account their relationships, the univariate model, which treats each feature as separate inputs for a shared neural network, disregards relationships. However, in the partial-multivariate model, several subsets of size $S$ are sampled from a complete feature set and relationships are captured only within each subset --- note that a single neural network is shared by all sampled subsets.
  • Figure 2: Architecture of Partial-Multivariate Transformer (PMformer). To emphasize row-wise attention operations, we enclose each row within bold frames before feeding them into the attention modules. In this figure, the subset size $S$ is 3.
  • Figure 3: Test MSE by changing $S$.
  • Figure 4: Test MSE by changing $|\mathbf{F}^{all}|$, fixing $S$.
  • Figure 5: The effect of $N_I$ on test MSE when (a) $S$ is fixed to the selected hyperparameter and (b) $S$ changes. For (b), the y axis shows the difference of test MSE between when $N_E \in \{1,2,4,8,16,32,64,128\}$ and $N_E=128$.
  • ...and 9 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • proof
  • Lemma 1
  • proof
  • proof
  • Theorem 3
  • proof