Partial-Multivariate Model for Forecasting

Jaehoon Lee; Hankook Lee; Sungik Choi; Sungjun Cho; Moontae Lee

Partial-Multivariate Model for Forecasting

Jaehoon Lee, Hankook Lee, Sungik Choi, Sungjun Cho, Moontae Lee

TL;DR

The paper tackles time-series forecasting with multiple variables by introducing Partial-Multivariate models, a middle ground between univariate and complete-multivariate approaches. It proposes PMformer, a Transformer-based model that learns dependencies only within randomly sampled feature subsets of size $S$ from a full set of $D$ features, using a shared architecture across all subsets. The approach includes training algorithms based on random sampling or partitioning, a simple inference technique that averages over multiple random partitions, and a PAC-Bayes–inspired theoretical analysis explaining why partial-multivariate modeling can generalize well. Empirically, PMformer outperforms a broad set of baselines across seven real-world datasets and exhibits efficiency and robustness to missing features, suggesting practical benefits for scalable, heterogeneous time-series forecasting. The work also discusses limitations and potential broader impacts for foundation models in time-series settings.

Abstract

When solving forecasting problems including multiple time-series features, existing approaches often fall into two extreme categories, depending on whether to utilize inter-feature information: univariate and complete-multivariate models. Unlike univariate cases which ignore the information, complete-multivariate models compute relationships among a complete set of features. However, despite the potential advantage of leveraging the additional information, complete-multivariate models sometimes underperform univariate ones. Therefore, our research aims to explore a middle ground between these two by introducing what we term Partial-Multivariate models where a neural network captures only partial relationships, that is, dependencies within subsets of all features. To this end, we propose PMformer, a Transformer-based partial-multivariate model, with its training algorithm. We demonstrate that PMformer outperforms various univariate and complete-multivariate models, providing a theoretical rationale and empirical analysis for its superiority. Additionally, by proposing an inference technique for PMformer, the forecasting accuracy is further enhanced. Finally, we highlight other advantages of PMformer: efficiency and robustness under missing features.

Partial-Multivariate Model for Forecasting

TL;DR

from a full set of

features, using a shared architecture across all subsets. The approach includes training algorithms based on random sampling or partitioning, a simple inference technique that averages over multiple random partitions, and a PAC-Bayes–inspired theoretical analysis explaining why partial-multivariate modeling can generalize well. Empirically, PMformer outperforms a broad set of baselines across seven real-world datasets and exhibits efficiency and robustness to missing features, suggesting practical benefits for scalable, heterogeneous time-series forecasting. The work also discusses limitations and potential broader impacts for foundation models in time-series settings.

Abstract

Paper Structure (28 sections, 4 theorems, 12 equations, 14 figures, 11 tables, 2 algorithms)

This paper contains 28 sections, 4 theorems, 12 equations, 14 figures, 11 tables, 2 algorithms.

Introduction
Related Works
Method
Partial-Multivariate Forecasting Model
PMformer
Training Algorithm for PMformer
Inference Technique for PMformer
Theoretical Analysis on PMformer
Experiments
Experimental Setup
Forecasting Result
Analysis
Conclusion
Proof
Proof for Theorem \ref{['theorm:generalizationbound']}
...and 13 more sections

Key Result

Theorem 1

Under some assumptions, with probability at least $1-\delta$ over the selection of the sample $\mathcal{T}$, we have the following for generalized loss $l(\mathbf{Q})$ under posterior distributions $\mathbf{Q}$. where $H(\mathbf{Q})$ is the entropy of $\mathbf{Q}$, (i.e., $H(\mathbf{Q}) = E_{h\sim \mathbf{Q}}[-\log \mathbf{Q}(h)]$) and $C$ is a constant.

Figures (14)

Figure 1: Visualization of three types of models. While the complete-multivariate model processes a complete set of features simultaneously taking into account their relationships, the univariate model, which treats each feature as separate inputs for a shared neural network, disregards relationships. However, in the partial-multivariate model, several subsets of size $S$ are sampled from a complete feature set and relationships are captured only within each subset --- note that a single neural network is shared by all sampled subsets.
Figure 2: Architecture of Partial-Multivariate Transformer (PMformer). To emphasize row-wise attention operations, we enclose each row within bold frames before feeding them into the attention modules. In this figure, the subset size $S$ is 3.
Figure 3: Test MSE by changing $S$.
Figure 4: Test MSE by changing $|\mathbf{F}^{all}|$, fixing $S$.
Figure 5: The effect of $N_I$ on test MSE when (a) $S$ is fixed to the selected hyperparameter and (b) $S$ changes. For (b), the y axis shows the difference of test MSE between when $N_E \in \{1,2,4,8,16,32,64,128\}$ and $N_E=128$.
...and 9 more figures

Theorems & Definitions (8)

Theorem 1
Theorem 2
proof
Lemma 1
proof
proof
Theorem 3
proof

Partial-Multivariate Model for Forecasting

TL;DR

Abstract

Partial-Multivariate Model for Forecasting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (8)