$\clubsuit$ CLOVER $\clubsuit$: Probabilistic Forecasting with Coherent Learning Objective Reparameterization

Kin G. Olivares; Geoffrey Négiar; Ruijun Ma; O. Nangba Meetei; Mengfei Cao; Michael W. Mahoney

$\clubsuit$ CLOVER $\clubsuit$: Probabilistic Forecasting with Coherent Learning Objective Reparameterization

Kin G. Olivares, Geoffrey Négiar, Ruijun Ma, O. Nangba Meetei, Mengfei Cao, Michael W. Mahoney

TL;DR

CLOVER tackles the challenge of probabilistic hierarchical forecasting by embedding a coherent multivariate Gaussian factor model within an MQForecaster backbone, enabling end-to-end training with differentiable samples. The method enforces exact coherence through a linear aggregation structure and allows optimization of arbitrary differentiable objectives, notably CRPS and the Energy Score, via the reparameterization trick. Empirical results on six public datasets show CLOVER achieving substantial improvements in probabilistic accuracy (average around 15% sCRPS gains) and mean forecast accuracy, with pronounced gains in hierarchically rich and highly correlated settings. The work demonstrates the practicality and scalability of end-to-end coherent forecasting, offering a flexible framework for incorporating cross-series information and alternative scoring metrics in operational forecasting tasks.

Abstract

Obtaining accurate probabilistic forecasts is an operational challenge in many applications, such as energy management, climate forecasting, supply chain planning, and resource allocation. Many of these applications present a natural hierarchical structure over the forecasted quantities; and forecasting systems that adhere to this hierarchical structure are said to be coherent. Furthermore, operational planning benefits from the accuracy at all levels of the aggregation hierarchy. However, building accurate and coherent forecasting systems is challenging: classic multivariate time series tools and neural network methods are still being adapted for this purpose. In this paper, we augment an MQForecaster neural network architecture with a modified multivariate Gaussian factor model that achieves coherence by construction. The factor model samples can be differentiated with respect to the model parameters, allowing optimization on arbitrary differentiable learning objectives that align with the forecasting system's goals, including quantile loss and the scaled Continuous Ranked Probability Score (CRPS). We call our method the Coherent Learning Objective Reparametrization Neural Network (CLOVER). In comparison to state-of-the-art coherent forecasting methods, CLOVER achieves significant improvements in scaled CRPS forecast accuracy, with average gains of 15%, as measured on six publicly-available datasets.

$\clubsuit$ CLOVER $\clubsuit$: Probabilistic Forecasting with Coherent Learning Objective Reparameterization

TL;DR

Abstract

Paper Structure (34 sections, 2 theorems, 31 equations, 10 figures, 6 tables)

This paper contains 34 sections, 2 theorems, 31 equations, 10 figures, 6 tables.

Introduction
Hierarchical Forecast Task.
Notation.
Probabilistic Forecast Task.
Hierarchical Forecast Scoring Rule.
Methodology
Coherent Probabilistic Model
Neural Network Architecture
Learning Objective
Discussion
Empirical Evaluation
Setting
Datasets.
Evaluation metrics.
Baseline Models.
...and 19 more sections

Key Result

Lemma A.1

Let $(\Omega_{[b]}, \mathcal{F}_{[b]}, \mathbb{P}_{[b]})$ be a probabilistic forecast space with $\mathcal{F}_{[b]}$ a $\sigma$-algebra in $\Omega_{[b]}$. If a forecast distribution $\mathbb{P}_{[i]}$ assigns a zero probability to sets that do not contain coherent forecasts, it defines a coherent pr

Figures (10)

Figure 1: A simple time series hierarchical structure with $N_a=3$ aggregates over $N_b=4$ bottom time series. Figure \ref{['subfig:graph']} shows the disaggregated bottom variables with blue background. Figure \ref{['subfig:matrix']} (right) shows the corresponding hierarchical aggregation constraints matrix with horizontal lines to separate levels of the hierarchy. We decompose our evaluation throughout the levels.
Figure 2: The Coherent Learning Objective Reparameterization Neural Network is a Sequence-to-Sequence with Context network that uses dilated temporal convolutions as the primary encoder and multilayer perceptron based decoders for the creation of the multi-step forecast. CLOVER coherently aggregates the samples of the factor model $\mathbf{\tilde{y}}_{[i],\eta,t}=\mathbf{S}_{[i][b]}\mathbf{\hat{y}}_{[b],\eta,t}$. We mark in red the standard normal samples that are parameter-free, the reparameterization trick allows to apply backpropagation through the factor model outputs. CLOVER extends upon the univariate MQCNN, through the cross series multi layer perceptron.
Figure 3: Ablation studies on the Bay Area Traffic dataset: a) In highly correlated hierarchies, VAR inputs enabled by the CLOVER to significantly improve over the univariate MQCNN's accuracy. b) The factor model CRPS learning objective demonstrates clear advantages over classic negative log-likelihood. Full ablation studies described in Appendix \ref{['sec:ablation_studies']}.
Figure 4: PyTorch function for sampling from our Gaussian Factor model. Note that the factor samples are shared across all bottom-level distributions. The samples are differentiable with regard to the function inputs. We can easily adapt this function to sample from other distributions.
Figure 5: Hierarchical constraints of the empirical evaluation datasets. (a) Labour reports 57 series number of employees by full-time status, gender and geographic levels. (b) Traffic organizes the occupancy series of 200 highways into quarters, halves, and totals. (c) Tourism-S categorizes its 89 regional visit series based on travel purpose, zones, states, and country-level aggregations and urbanization within regions. (d) Tourism-L categorizes its 555 regional visit series based on travel purpose, zones, states, and country-level geographical aggregations. (e) Wiki groups 150 daily visits to Wikipedia articles by language and article categorical taxonomy. (f) Favorita classifies its grocery sales by store, city, state, and country levels.
...and 5 more figures

Theorems & Definitions (6)

Definition 2.1
Definition 3.1
Lemma A.1
proof
Lemma A.2
proof

$\clubsuit$ CLOVER $\clubsuit$: Probabilistic Forecasting with Coherent Learning Objective Reparameterization

TL;DR

Abstract

$\clubsuit$ CLOVER $\clubsuit$: Probabilistic Forecasting with Coherent Learning Objective Reparameterization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (6)