Table of Contents
Fetching ...

Natural-gas storage modelling by deep reinforcement learning

Tiziano Balaconi, Aldo Glielmo, Marco Taboga

TL;DR

GasRL addresses how optimal storage policies by a monopolistic operator influence natural-gas price dynamics under regulatory mandates. It couples a calibrated Italian market with a storage-operator RL agent trained via Soft Actor-Critic to maximize a multi-objective reward that includes $g_t$, $(\Delta p_t)^2$, $m_t$, and $n_t$ under inventory bounds $0 \le I_t \le I_{\max}$. The results show endogenously realistic price seasonality and volatility, profitability, and near-absence of market failures, with SAC outperforming other methods; regulatory experiments with a minimum-threshold of 83% November refilling modestly boost resilience to supply shocks at some profitability and volatility costs. The framework offers a reproducible platform for market analysis and regulatory design, with potential extensions to GPUs, multi-agent settings, and international market linkages.

Abstract

We introduce GasRL, a simulator that couples a calibrated representation of the natural gas market with a model of storage-operator policies trained with deep reinforcement learning (RL). We use it to analyse how optimal stockpile management affects equilibrium prices and the dynamics of demand and supply. We test various RL algorithms and find that Soft Actor Critic (SAC) exhibits superior performance in the GasRL environment: multiple objectives of storage operators - including profitability, robust market clearing and price stabilisation - are successfully achieved. Moreover, the equilibrium price dynamics induced by SAC-derived optimal policies have characteristics, such as volatility and seasonality, that closely match those of real-world prices. Remarkably, this adherence to the historical distribution of prices is obtained without explicitly calibrating the model to price data. We show how the simulator can be used to assess the effects of EU-mandated minimum storage thresholds. We find that such thresholds have a positive effect on market resilience against unanticipated shifts in the distribution of supply shocks. For example, with unusually large shocks, market disruptions are averted more often if a threshold is in place.

Natural-gas storage modelling by deep reinforcement learning

TL;DR

GasRL addresses how optimal storage policies by a monopolistic operator influence natural-gas price dynamics under regulatory mandates. It couples a calibrated Italian market with a storage-operator RL agent trained via Soft Actor-Critic to maximize a multi-objective reward that includes , , , and under inventory bounds . The results show endogenously realistic price seasonality and volatility, profitability, and near-absence of market failures, with SAC outperforming other methods; regulatory experiments with a minimum-threshold of 83% November refilling modestly boost resilience to supply shocks at some profitability and volatility costs. The framework offers a reproducible platform for market analysis and regulatory design, with potential extensions to GPUs, multi-agent settings, and international market linkages.

Abstract

We introduce GasRL, a simulator that couples a calibrated representation of the natural gas market with a model of storage-operator policies trained with deep reinforcement learning (RL). We use it to analyse how optimal stockpile management affects equilibrium prices and the dynamics of demand and supply. We test various RL algorithms and find that Soft Actor Critic (SAC) exhibits superior performance in the GasRL environment: multiple objectives of storage operators - including profitability, robust market clearing and price stabilisation - are successfully achieved. Moreover, the equilibrium price dynamics induced by SAC-derived optimal policies have characteristics, such as volatility and seasonality, that closely match those of real-world prices. Remarkably, this adherence to the historical distribution of prices is obtained without explicitly calibrating the model to price data. We show how the simulator can be used to assess the effects of EU-mandated minimum storage thresholds. We find that such thresholds have a positive effect on market resilience against unanticipated shifts in the distribution of supply shocks. For example, with unusually large shocks, market disruptions are averted more often if a threshold is in place.

Paper Structure

This paper contains 7 sections, 11 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Illustration of the GasRL simulator. The RL agent learns a policy $P_t(\mathbf{x}_{t-1})$ for the price of the natural gas at time $t$ given the market conditions at time $t-1$. The policy is learned via the maximisation of the expected value of discounted future rewards through repeated interactions with a stochastic market simulator. The instantaneous reward $R_t$ of the RL agent increases for increasing profits, but it decreases for increasing price volatility, lack of market clearing and non-compliance with regulations. The instantaneous vector of market conditions $\mathbf{x}_{t}$ includes signals such as the time of the year and the current values of demand and supply, stock of gas, and market shocks.
  • Figure 2: SAC outperforms other RL schemes for GasRL and yields realistic-looking time series. The centre panel shows the mean cumulative episodic test rewards of five standard RL schemes as a function of the number of training steps. SAC stands out as the best-performing RL scheme, achieving better rewards more reliably than its competitors. The other panels show the mean trajectories of the price ($P_t$, left panels) and bank account ($g_t$, right panels) as learned by the SAC agent at 4000 steps (bottom rows) and at 1.5 million steps (top rows). The RL agent very quickly learns to set prices according to the season, as apparent by the periodicity of the 4000-steps pricing trajectory, but this is not sufficient to achieve good profits, as indicated by the 4000-steps bank account trajectory. However, at the end of training, the RL agent learns a much more sophisticated pricing policy that is able to achieve good profitability.
  • Figure 3: GasRL yields profitability, stable markets and reasonable stockpiles. The figure illustrates how the model's test performance, evaluated using different metrics, changes as the number of training steps increases. All panels show the mean and the 95% confidence intervals on the mean computed with 50 repetitions, for the best SAC model saved at different checkpoints. Specifically, from left to right the different panels present the means of: reward ($R_t$), bank account ($g_t$), price volatility ($(\Delta p_t)^2$), market success rate ($1-m_t$), and the level of inventories ($I_{t}$) at the beginning of November. With increased training, the reward rises until it converges; the bank account increases with more steps with an uneven progression, as sometimes profitability is lost in favour of a lower volatility. The market‑success metric levels off much earlier, at around 32.000 learning steps. The inventories in November rise up to 2.5 (or 83% of the storage capacity) at the beginning of training before settling around 2.2 (73% of the storage capacity).
  • Figure 4: GasRL price volatilities and seasonality are consistent with real-world data. The main panel depicts the seasonality of natural-gas prices as computed on real-world data (red bars) and on synthetic data generated by the GasRL simulator (blue bars). Given the high variability in the seasonality of simulated data, we also show the seasonality computed on the prices as averaged over multiple runs (green bars). The inset in the bottom right shows kernel density estimates of the distribution of the first log differences, for the same three series and using the same colour code. In both graphs, the coherence between the real-world data and the output of the GasRL simulator is clear.
  • Figure 5: GasRL suggests that a regulatory threshold on gas stockpiles can increase market stability. The figure illustrates test results for different supply-shock volatility test-values $\sigma_s$ (i.e., what happens when the storage operator unexpectedly faces a volatility of supply shocks that is different from the one used to optimise the policy). Each panel shows the mean and the 95% confidence interval around the mean, computed with 1000 repetitions. From left to right, the panels report the mean values of the market success rate ($1-m_t$), the bank account ($g_t$), the price volatility level ($(\Delta p_t)^2$) and the price level ($P_t$), as a function of supply shock volatility ($\sigma_s$), for the baseline model ($\theta_n = 0$, blue circles) and the regulated model ($\theta_n = 1000$, orange squares). Introducing a penalty for not reaching the 83% minimum-storage threshold seems to slightly improve market success robustness ($1-m_t$). However, this comes at the expense of reduced profits for the gas storage operator, as evidenced by significantly lower bank account values ($g_t$), and at the cost of a slightly increased price volatility $(\Delta p_t)^2$. Interestingly, the average price level ($P_t$) is roughly unaltered by the regulatory requirement.