Table of Contents
Fetching ...

Forecasting time series with constraints

Nathan Doumèche, Francis Bach, Éloi Bedek, Gérard Biau, Claire Boyer, Yannig Goude

TL;DR

This work introduces WeaKL, a unified framework for forecasting time series under linear constraints by formulating the learning task as minimizing a constrained empirical risk with a data term plus a penalty. The key advance is an exact, closed-form minimizer computable via linear algebra: $\hat{\theta}=\left(\left(\sum_{t_j}\mathcal{\Phi}_{t_j}^*\Lambda^*\Lambda\mathcal\Phi_{t_j}\right)+nM^*M\right)^{-1}\left(\sum_{t_j}\mathcal{\Phi}_{t_j}^*\Lambda^*\Lambda Y_{t_j}\right)$, enabling GPU-accelerated, scalable learning. The framework distinguishes shape constraints (e.g., additive models, online adaptation) from learning constraints (e.g., transfer learning, hierarchical forecasting, differential priors), and demonstrates three estimators—WeaKL-BU, WeaKL-G, and WeaKL-T—to enforce hierarchy and transfer information. Through two electricity-load forecasting use cases and a tourism forecasting case, WeaKL consistently achieves state-of-the-art or competitive performance while offering interpretable, decomposable models and fast runtimes. The paper also provides extensive methodological details, proofs, and ablations, along with public code, underscoring the practical impact of constraint-aware, kernel-like time-series learning on GPU hardware.

Abstract

Time series forecasting presents unique challenges that limit the effectiveness of traditional machine learning algorithms. To address these limitations, various approaches have incorporated linear constraints into learning algorithms, such as generalized additive models and hierarchical forecasting. In this paper, we propose a unified framework for integrating and combining linear constraints in time series forecasting. Within this framework, we show that the exact minimizer of the constrained empirical risk can be computed efficiently using linear algebra alone. This approach allows for highly scalable implementations optimized for GPUs. We validate the proposed methodology through extensive benchmarking on real-world tasks, including electricity demand forecasting and tourism forecasting, achieving state-of-the-art performance.

Forecasting time series with constraints

TL;DR

This work introduces WeaKL, a unified framework for forecasting time series under linear constraints by formulating the learning task as minimizing a constrained empirical risk with a data term plus a penalty. The key advance is an exact, closed-form minimizer computable via linear algebra: , enabling GPU-accelerated, scalable learning. The framework distinguishes shape constraints (e.g., additive models, online adaptation) from learning constraints (e.g., transfer learning, hierarchical forecasting, differential priors), and demonstrates three estimators—WeaKL-BU, WeaKL-G, and WeaKL-T—to enforce hierarchy and transfer information. Through two electricity-load forecasting use cases and a tourism forecasting case, WeaKL consistently achieves state-of-the-art or competitive performance while offering interpretable, decomposable models and fast runtimes. The paper also provides extensive methodological details, proofs, and ablations, along with public code, underscoring the practical impact of constraint-aware, kernel-like time-series learning on GPU hardware.

Abstract

Time series forecasting presents unique challenges that limit the effectiveness of traditional machine learning algorithms. To address these limitations, various approaches have incorporated linear constraints into learning algorithms, such as generalized additive models and hierarchical forecasting. In this paper, we propose a unified framework for integrating and combining linear constraints in time series forecasting. Within this framework, we show that the exact minimizer of the constrained empirical risk can be computed efficiently using linear algebra alone. This approach allows for highly scalable implementations optimized for GPUs. We validate the proposed methodology through extensive benchmarking on real-world tasks, including electricity demand forecasting and tourism forecasting, achieving state-of-the-art performance.

Paper Structure

This paper contains 63 sections, 4 theorems, 55 equations, 5 figures, 5 tables.

Key Result

Proposition 2.1

Suppose both $M$ and $\Lambda$ are injective. Then, there is a unique minimizer to eq:risk, which takes the form where $\mathbb \Phi_t$ is the $d_2\times \dim(\theta)$ block-wise diagonal feature matrix at time $t$, defined by

Figures (5)

  • Figure 1: Effect in MW of the temperature in the additive WeaKL.
  • Figure 2: Error $Y_t - \hat{Y}_t$ in MW of the WeaKLs on the test period including holidays. Dots represent individual observations, while the bold curves indicate the one-week moving averages.
  • Figure 3: Graph representing the hierarchy of Australian domestic tourism.
  • Figure 4: Hierarchical forecasting performance with $2d/n = 0.5$.
  • Figure 5: Hierarchical forecasting performance with $2d/n = 0.95$.

Theorems & Definitions (5)

  • Proposition 2.1: Empirical risk minimizer.
  • Lemma 1.1: Full rank
  • Lemma 1.2: Orthogonal projection
  • Proposition 1.3: Constrained estimators perform better.
  • Remark 1.4