Interval Forecasts for Gas Prices in the Face of Structural Breaks -- Statistical Models vs. Neural Networks
Stephan Schlüter, Sven Pappert, Martin Neumann
TL;DR
This study investigates interval forecasts for Dutch TTF Front Month gas prices in the presence of a structural break triggered by geopolitical events. It compares statistical models (ARMA-APARCH, t-copula time series) with neural networks (MLP, TCN, LSTM) using prediction-interval metrics (PICP, PIAW, interval score) and quantile-based losses. The key finding is that during the shock, simpler models such as ARMA-APARCH and MLP with quality-driven or PB losses provide better interval coverage and narrower widths, while LSTM underperforms; after the shock, performance shifts and t-copula and ARMA-APARCH become more competitive. Overall, the results suggest limited robustness of complex neural networks to abrupt regime changes in energy prices and highlight the value of well-calibrated, parsimonious stochastic models for risk management in volatile gas markets.
Abstract
Reliable gas price forecasts are an essential information for gas and energy traders, for risk managers and also economists. However, ahead of the war in Ukraine Europe began to suffer from substantially increased and volatile gas prices which culminated in the aftermath of the North Stream 1 explosion. This shock changed both trend and volatility structure of the prices and has considerable effects on forecasting models. In this study we investigate whether modern machine learning methods such as neural networks are more resilient against such changes than statistical models such as autoregressive moving average (ARMA) models with conditional heteroskedasticity, or copula-based time series models. Thereby the focus lies on interval forecasting and applying respective evaluation measures. As data, the Front Month prices from the Dutch Title Transfer Facility, currently the predominant European exchange, are used. We see that, during the shock period, most models underestimate the variance while overestimating the variance in the after-shock period. Furthermore, we recognize that, during the shock, the simpler models, i.e. an ARMA model with conditional heteroskedasticity and the multilayer perceptron (a neural network), perform best with regards to prediction interval coverage. Interestingly, the widely-used long-short term neural network is outperformed by its competitors.
