Table of Contents
Fetching ...

Interval Forecasts for Gas Prices in the Face of Structural Breaks -- Statistical Models vs. Neural Networks

Stephan Schlüter, Sven Pappert, Martin Neumann

TL;DR

This study investigates interval forecasts for Dutch TTF Front Month gas prices in the presence of a structural break triggered by geopolitical events. It compares statistical models (ARMA-APARCH, t-copula time series) with neural networks (MLP, TCN, LSTM) using prediction-interval metrics (PICP, PIAW, interval score) and quantile-based losses. The key finding is that during the shock, simpler models such as ARMA-APARCH and MLP with quality-driven or PB losses provide better interval coverage and narrower widths, while LSTM underperforms; after the shock, performance shifts and t-copula and ARMA-APARCH become more competitive. Overall, the results suggest limited robustness of complex neural networks to abrupt regime changes in energy prices and highlight the value of well-calibrated, parsimonious stochastic models for risk management in volatile gas markets.

Abstract

Reliable gas price forecasts are an essential information for gas and energy traders, for risk managers and also economists. However, ahead of the war in Ukraine Europe began to suffer from substantially increased and volatile gas prices which culminated in the aftermath of the North Stream 1 explosion. This shock changed both trend and volatility structure of the prices and has considerable effects on forecasting models. In this study we investigate whether modern machine learning methods such as neural networks are more resilient against such changes than statistical models such as autoregressive moving average (ARMA) models with conditional heteroskedasticity, or copula-based time series models. Thereby the focus lies on interval forecasting and applying respective evaluation measures. As data, the Front Month prices from the Dutch Title Transfer Facility, currently the predominant European exchange, are used. We see that, during the shock period, most models underestimate the variance while overestimating the variance in the after-shock period. Furthermore, we recognize that, during the shock, the simpler models, i.e. an ARMA model with conditional heteroskedasticity and the multilayer perceptron (a neural network), perform best with regards to prediction interval coverage. Interestingly, the widely-used long-short term neural network is outperformed by its competitors.

Interval Forecasts for Gas Prices in the Face of Structural Breaks -- Statistical Models vs. Neural Networks

TL;DR

This study investigates interval forecasts for Dutch TTF Front Month gas prices in the presence of a structural break triggered by geopolitical events. It compares statistical models (ARMA-APARCH, t-copula time series) with neural networks (MLP, TCN, LSTM) using prediction-interval metrics (PICP, PIAW, interval score) and quantile-based losses. The key finding is that during the shock, simpler models such as ARMA-APARCH and MLP with quality-driven or PB losses provide better interval coverage and narrower widths, while LSTM underperforms; after the shock, performance shifts and t-copula and ARMA-APARCH become more competitive. Overall, the results suggest limited robustness of complex neural networks to abrupt regime changes in energy prices and highlight the value of well-calibrated, parsimonious stochastic models for risk management in volatile gas markets.

Abstract

Reliable gas price forecasts are an essential information for gas and energy traders, for risk managers and also economists. However, ahead of the war in Ukraine Europe began to suffer from substantially increased and volatile gas prices which culminated in the aftermath of the North Stream 1 explosion. This shock changed both trend and volatility structure of the prices and has considerable effects on forecasting models. In this study we investigate whether modern machine learning methods such as neural networks are more resilient against such changes than statistical models such as autoregressive moving average (ARMA) models with conditional heteroskedasticity, or copula-based time series models. Thereby the focus lies on interval forecasting and applying respective evaluation measures. As data, the Front Month prices from the Dutch Title Transfer Facility, currently the predominant European exchange, are used. We see that, during the shock period, most models underestimate the variance while overestimating the variance in the after-shock period. Furthermore, we recognize that, during the shock, the simpler models, i.e. an ARMA model with conditional heteroskedasticity and the multilayer perceptron (a neural network), perform best with regards to prediction interval coverage. Interestingly, the widely-used long-short term neural network is outperformed by its competitors.
Paper Structure (21 sections, 18 equations, 11 figures, 4 tables)

This paper contains 21 sections, 18 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Multilayer perceptron with two hidden neurons and one output. The $n$ features $x_i, i \in \{1,\hdots,n\}$ are first aggregated for each of the two neurons using the weights $w_{hi1}, i \in \{1, \hdots, n\}$ for the first node and the weights $w_{hi2}, i \in \{1, \hdots, n\}$ for the second node. Then the respective aggregate is plugged into the activation function $\phi_h$. The two results are again aggregated using the weights $w_{o_1}$ and $w_{o_2}$. Weighted with $w_{o_0}$ an intercept is added to the aggregate. Lastly, the result is again transformed by an activation function $\phi_o$ to yield the result $y$.
  • Figure 2: Left: A dilated causal convolution with dilation factors d = 1, 2, 4 and filter size k = 3. The receptive field is able to cover all values from the input sequence. Right: TCN residual block which may be repeated with different dilations and kernel sizes bai2018.
  • Figure 3: Neural network with one hidden layer and two output nodes for interval prediction khosravi2011
  • Figure 4: TTF Front Month Prices
  • Figure 5: First Differences
  • ...and 6 more figures