Table of Contents
Fetching ...

Surrogate uncertainty estimation for your time series forecasting black-box: learn when to trust

Leonid Erlygin, Vladimir Zholobov, Valeriia Baklanova, Evgeny Sokolovskiy, Alexey Zaytsev

TL;DR

This research introduces a surrogate Gaussian process regression model that enhances any base regression model with reasonable uncertainty estimates and outperforms both bootstrap-based and built-in methods in a medium-data regime.

Abstract

Machine learning models play a vital role in time series forecasting. These models, however, often overlook an important element: point uncertainty estimates. Incorporating these estimates is crucial for effective risk management, informed model selection, and decision-making.To address this issue, our research introduces a method for uncertainty estimation. We employ a surrogate Gaussian process regression model. It enhances any base regression model with reasonable uncertainty estimates. This approach stands out for its computational efficiency. It only necessitates training one supplementary surrogate and avoids any data-specific assumptions. Furthermore, this method for work requires only the presence of the base model as a black box and its respective training data. The effectiveness of our approach is supported by experimental results. Using various time-series forecasting data, we found that our surrogate model-based technique delivers significantly more accurate confidence intervals. These techniques outperform both bootstrap-based and built-in methods in a medium-data regime. This superiority holds across a range of base model types, including a linear regression, ARIMA, gradient boosting and a neural network.

Surrogate uncertainty estimation for your time series forecasting black-box: learn when to trust

TL;DR

This research introduces a surrogate Gaussian process regression model that enhances any base regression model with reasonable uncertainty estimates and outperforms both bootstrap-based and built-in methods in a medium-data regime.

Abstract

Machine learning models play a vital role in time series forecasting. These models, however, often overlook an important element: point uncertainty estimates. Incorporating these estimates is crucial for effective risk management, informed model selection, and decision-making.To address this issue, our research introduces a method for uncertainty estimation. We employ a surrogate Gaussian process regression model. It enhances any base regression model with reasonable uncertainty estimates. This approach stands out for its computational efficiency. It only necessitates training one supplementary surrogate and avoids any data-specific assumptions. Furthermore, this method for work requires only the presence of the base model as a black box and its respective training data. The effectiveness of our approach is supported by experimental results. Using various time-series forecasting data, we found that our surrogate model-based technique delivers significantly more accurate confidence intervals. These techniques outperform both bootstrap-based and built-in methods in a medium-data regime. This superiority holds across a range of base model types, including a linear regression, ARIMA, gradient boosting and a neural network.
Paper Structure (30 sections, 1 theorem, 6 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 30 sections, 1 theorem, 6 equations, 6 figures, 9 tables, 1 algorithm.

Key Result

Lemma 3.1

The computational complexity for the evaluation of the loss function eq:gp_loss equals $O(N^3) + O(L N)$.

Figures (6)

  • Figure 1: Example of uncertainty estimation with our method and other methods: the left plot shows obtained predictions and corresponding uncertainty estimates, and the right plot provides insight into the quality of the uncertainty estimation. The description of dataset A is available below. (a) Uncertainty estimation for our method BAMOES and a Bootstrap method for one-step-ahead time series forecasting problem. Solid lines are model predictions, and dashed lines are $0.95$ confidence intervals. Our approach is more adequate, especially during the sudden change of the true values of the target function. It isn't overconfident and better reflects anomaly near the $10$-th time step. (b) Comparison of true and estimated quantiles for the left plot. Our BAMOES method provides better results, being close to the blue dashed diagonal that corresponds to perfect calibration. We see them from curves themselves and from miscalibration areas that we want to minimize.
  • Figure 2: Example of Uncertainty estimation for two-dimensional input via a matching surrogate model. The variance estimate corresponds to the fill color at the point. At points from the initial training sample $X$, the uncertainty is almost zero, while for points from the additional sample $X'$, it takes reasonable values reflecting our absence of knowledge about the true function values at these locations.
  • Figure 3: Model comparison of Miscalibration Area on Forecasting data. Here OLS is selected for base model.
  • Figure 4: Dependence of the surrogate uncertainty estimate quality metric Miscalibration area on the hyperparameter $C$ for different numbers of generated points
  • Figure 5: OLS base surrogate model comparison of Miscal. Area on Forecasting data
  • ...and 1 more figures

Theorems & Definitions (2)

  • Lemma 3.1
  • proof