Table of Contents
Fetching ...

Efficient pooling of predictions via kernel embeddings

Sam Allen, David Ginsbourger, Johanna Ziegel

TL;DR

By embedding predictions into a Reproducing Kernel Hilbert Space (RKHS), it is illustrated that estimating the linear pool weights that optimise kernel-based scoring rules is a convex quadratic optimisation problem, which permits an efficient implementation of the linear pool when optimally combining predictions on arbitrary outcome domains.

Abstract

Probabilistic predictions are probability distributions over the set of possible outcomes. Such predictions quantify the uncertainty in the outcome, making them essential for effective decision making. By combining multiple predictions, the information sources used to generate the predictions are pooled, often resulting in a more informative forecast. Probabilistic predictions are typically combined by linearly pooling the individual predictive distributions; this encompasses several ensemble learning techniques, for example. The weights assigned to each prediction can be estimated based on their past performance, allowing more accurate predictions to receive a higher weight. This can be achieved by finding the weights that optimise a proper scoring rule over some training data. By embedding predictions into a Reproducing Kernel Hilbert Space (RKHS), we illustrate that estimating the linear pool weights that optimise kernel-based scoring rules is a convex quadratic optimisation problem. This permits an efficient implementation of the linear pool when optimally combining predictions on arbitrary outcome domains. This result also holds for other combination strategies, and we additionally study a flexible generalisation of the linear pool that overcomes some of its theoretical limitations, whilst allowing an efficient implementation within the RKHS framework. These approaches are compared in an application to operational wind speed forecasts, where this generalisation is found to offer substantial improvements upon the traditional linear pool.

Efficient pooling of predictions via kernel embeddings

TL;DR

By embedding predictions into a Reproducing Kernel Hilbert Space (RKHS), it is illustrated that estimating the linear pool weights that optimise kernel-based scoring rules is a convex quadratic optimisation problem, which permits an efficient implementation of the linear pool when optimally combining predictions on arbitrary outcome domains.

Abstract

Probabilistic predictions are probability distributions over the set of possible outcomes. Such predictions quantify the uncertainty in the outcome, making them essential for effective decision making. By combining multiple predictions, the information sources used to generate the predictions are pooled, often resulting in a more informative forecast. Probabilistic predictions are typically combined by linearly pooling the individual predictive distributions; this encompasses several ensemble learning techniques, for example. The weights assigned to each prediction can be estimated based on their past performance, allowing more accurate predictions to receive a higher weight. This can be achieved by finding the weights that optimise a proper scoring rule over some training data. By embedding predictions into a Reproducing Kernel Hilbert Space (RKHS), we illustrate that estimating the linear pool weights that optimise kernel-based scoring rules is a convex quadratic optimisation problem. This permits an efficient implementation of the linear pool when optimally combining predictions on arbitrary outcome domains. This result also holds for other combination strategies, and we additionally study a flexible generalisation of the linear pool that overcomes some of its theoretical limitations, whilst allowing an efficient implementation within the RKHS framework. These approaches are compared in an application to operational wind speed forecasts, where this generalisation is found to offer substantial improvements upon the traditional linear pool.

Paper Structure

This paper contains 15 sections, 3 theorems, 34 equations, 13 figures.

Key Result

Proposition 1

Let $S_{k}$ be a kernel score on $\mathcal{Y}$, and let $F_{LP}$ denote the linear pool at equation eq:lpool. Then, for any weights $w_{1}, \dots, w_{J} \geq 0$ that sum to 1, and any $y \in \mathcal{Y}$,

Figures (13)

  • Figure 1: The 82 weather stations in Switzerland.
  • Figure 2: The average weight assigned to each sample member against the member's mean squared error. The $\times$ symbol corresponds to the control member of each discrete predictive distribution. Results are shown at a lead time of 18 hours. In the univariate case, the weights are averaged across all stations.
  • Figure 3: The average weight assigned to each forecast model as a function of lead time. In the univariate case, the weights are averaged across all stations.
  • Figure 4: Forecast model at each of the 82 weather stations that receives the highest weight on average. Results are shown at a lead time of 18 hours.
  • Figure 5: Accuracy of the forecasting models and combination methods as a function of lead time. Accuracy is measured using the average CRPS in the univariate case, and the average energy score in the multivariate case. In the univariate case, the scores are averaged across all stations.
  • ...and 8 more figures

Theorems & Definitions (11)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Example 3: Linearly pooling discrete predictive distributions
  • Example 4: Linearly pooling point predictions
  • Example 5: Linearly pooling order statistics
  • Proposition 6
  • proof
  • Remark 7
  • ...and 1 more