Table of Contents
Fetching ...

Deep Reinforcement Learning Algorithms for Option Hedging

Andrei Neagu, Frédéric Godin, Leila Kosseim

TL;DR

This paper tackles dynamic hedging for option risk management by casting it as a sequential decision problem and benchmarking eight DRL algorithms, including two novel variants, against the Black-Scholes delta hedge under a GJR-GARCH$(1,1)$ market model. It provides a standardized comparison of algorithm performance and runtime, highlighting Monte Carlo Policy Gradient (MCPG) as the best performer with RSQP $0.8111$, followed by PPO, while most value-based methods underperform due to sparse rewards. The study shows MCPG’s advantage in sparse-reward environments and discusses limitations such as computational demands and hyperparameter sensitivity, suggesting reward shaping and higher-dimensional hedging as fruitful future directions. The findings guide practitioners toward MC-based DRL approaches for dynamic hedging and establish a benchmark for future algorithmic improvements in deep hedging. The work also opens up opportunities to apply these methods to more complex hedging setups with multiple instruments and richer state spaces.

Abstract

Dynamic hedging is a financial strategy that consists in periodically transacting one or multiple financial assets to offset the risk associated with a correlated liability. Deep Reinforcement Learning (DRL) algorithms have been used to find optimal solutions to dynamic hedging problems by framing them as sequential decision-making problems. However, most previous work assesses the performance of only one or two DRL algorithms, making an objective comparison across algorithms difficult. In this paper, we compare the performance of eight DRL algorithms in the context of dynamic hedging; Monte Carlo Policy Gradient (MCPG), Proximal Policy Optimization (PPO), along with four variants of Deep Q-Learning (DQL) and two variants of Deep Deterministic Policy Gradient (DDPG). Two of these variants represent a novel application to the task of dynamic hedging. In our experiments, we use the Black-Scholes delta hedge as a baseline and simulate the dataset using a GJR-GARCH(1,1) model. Results show that MCPG, followed by PPO, obtain the best performance in terms of the root semi-quadratic penalty. Moreover, MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget, possibly due to the sparsity of rewards in our environment.

Deep Reinforcement Learning Algorithms for Option Hedging

TL;DR

This paper tackles dynamic hedging for option risk management by casting it as a sequential decision problem and benchmarking eight DRL algorithms, including two novel variants, against the Black-Scholes delta hedge under a GJR-GARCH market model. It provides a standardized comparison of algorithm performance and runtime, highlighting Monte Carlo Policy Gradient (MCPG) as the best performer with RSQP , followed by PPO, while most value-based methods underperform due to sparse rewards. The study shows MCPG’s advantage in sparse-reward environments and discusses limitations such as computational demands and hyperparameter sensitivity, suggesting reward shaping and higher-dimensional hedging as fruitful future directions. The findings guide practitioners toward MC-based DRL approaches for dynamic hedging and establish a benchmark for future algorithmic improvements in deep hedging. The work also opens up opportunities to apply these methods to more complex hedging setups with multiple instruments and richer state spaces.

Abstract

Dynamic hedging is a financial strategy that consists in periodically transacting one or multiple financial assets to offset the risk associated with a correlated liability. Deep Reinforcement Learning (DRL) algorithms have been used to find optimal solutions to dynamic hedging problems by framing them as sequential decision-making problems. However, most previous work assesses the performance of only one or two DRL algorithms, making an objective comparison across algorithms difficult. In this paper, we compare the performance of eight DRL algorithms in the context of dynamic hedging; Monte Carlo Policy Gradient (MCPG), Proximal Policy Optimization (PPO), along with four variants of Deep Q-Learning (DQL) and two variants of Deep Deterministic Policy Gradient (DDPG). Two of these variants represent a novel application to the task of dynamic hedging. In our experiments, we use the Black-Scholes delta hedge as a baseline and simulate the dataset using a GJR-GARCH(1,1) model. Results show that MCPG, followed by PPO, obtain the best performance in terms of the root semi-quadratic penalty. Moreover, MCPG is the only algorithm to outperform the Black-Scholes delta hedge baseline with the allotted computational budget, possibly due to the sparsity of rewards in our environment.

Paper Structure

This paper contains 29 sections, 10 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: An episode of DRL dynamic hedging from $t=0,\dots,T$
  • Figure 2: Hedging position $\{X_{t+1}\}^{T-1}_{t=0}$ and underlying stock price $\{S_t\}^{T}_{t=0}$ for one simulated episode.