Table of Contents
Fetching ...

Deep Hedging Under Non-Convexity: Limitations and a Case for AlphaZero

Matteo Maggiolo, Giuseppe Nuti, Miroslav Štrupl, Oleg Szehr

TL;DR

The paper tackles replication under market incompleteness, framing it as a sequential, two-player optimization problem. It compares the industry-standard Deep Hedging approach with AlphaZero/MuZero–driven Monte Carlo Tree Search planning, identifying fundamental limits of gradient-based deterministic policies in non-convex environments. Theoretical results tie DH to convex optimization and unimodality, while empirical experiments show AlphaZero consistently approaching optimal replication in multimodal Q^* settings and doing so with better sample efficiency. The work highlights a trade-off between the robustness of MCTS-based planning and the practicality of DH, offering insights for hedging under constraints and non-convex costs and pointing toward transformer-based hybrids as a promising direction.

Abstract

This paper examines replication portfolio construction in incomplete markets - a key problem in financial engineering with applications in pricing, hedging, balance sheet management, and energy storage planning. We model this as a two-player game between an investor and the market, where the investor makes strategic bets on future states while the market reveals outcomes. Inspired by the success of Monte Carlo Tree Search in stochastic games, we introduce an AlphaZero-based system and compare its performance to deep hedging - a widely used industry method based on gradient descent. Through theoretical analysis and experiments, we show that deep hedging struggles in environments where the optimal action-value function is not subject to convexity constraints - such as those involving non-convex transaction costs, capital constraints, or regulatory limitations - converging to local optima. We construct specific market environments to highlight these limitations and demonstrate that AlphaZero consistently finds near-optimal replication strategies. On the theoretical side, we establish a connection between deep hedging and convex optimization, suggesting that its effectiveness is contingent on convexity assumptions. Our experiments further suggest that AlphaZero is more sample-efficient - an important advantage in data-scarce, overfitting-prone derivative markets.

Deep Hedging Under Non-Convexity: Limitations and a Case for AlphaZero

TL;DR

The paper tackles replication under market incompleteness, framing it as a sequential, two-player optimization problem. It compares the industry-standard Deep Hedging approach with AlphaZero/MuZero–driven Monte Carlo Tree Search planning, identifying fundamental limits of gradient-based deterministic policies in non-convex environments. Theoretical results tie DH to convex optimization and unimodality, while empirical experiments show AlphaZero consistently approaching optimal replication in multimodal Q^* settings and doing so with better sample efficiency. The work highlights a trade-off between the robustness of MCTS-based planning and the practicality of DH, offering insights for hedging under constraints and non-convex costs and pointing toward transformer-based hybrids as a promising direction.

Abstract

This paper examines replication portfolio construction in incomplete markets - a key problem in financial engineering with applications in pricing, hedging, balance sheet management, and energy storage planning. We model this as a two-player game between an investor and the market, where the investor makes strategic bets on future states while the market reveals outcomes. Inspired by the success of Monte Carlo Tree Search in stochastic games, we introduce an AlphaZero-based system and compare its performance to deep hedging - a widely used industry method based on gradient descent. Through theoretical analysis and experiments, we show that deep hedging struggles in environments where the optimal action-value function is not subject to convexity constraints - such as those involving non-convex transaction costs, capital constraints, or regulatory limitations - converging to local optima. We construct specific market environments to highlight these limitations and demonstrate that AlphaZero consistently finds near-optimal replication strategies. On the theoretical side, we establish a connection between deep hedging and convex optimization, suggesting that its effectiveness is contingent on convexity assumptions. Our experiments further suggest that AlphaZero is more sample-efficient - an important advantage in data-scarce, overfitting-prone derivative markets.

Paper Structure

This paper contains 39 sections, 7 theorems, 23 equations, 7 figures, 7 tables.

Key Result

Theorem 1

Consider the portfolio replication MDP described in Sections se:replPortfolios, se:replPortfolioMaintan and the setting of Sec. se:backgroundAndSetting. Assume a concave and increasing utility $u$ and a convex cost function $c$. Then the optimal action value function $Q^*$ is concave in the action a

Figures (7)

  • Figure 1: Learning deterministic assignments with bi-modal reward signals.
  • Figure 2: Learning optimal actions in a market with bi-modal $Q^*$-function at init. state.
  • Figure 3: Hist. of actions chosen at $t_0$.
  • Figure 4: MuZero & DH learning processes.
  • Figure 5: Learning optimal actions in a market, where $Q^*$ has a disconnected domain at init. state.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Lemma 1
  • Lemma 2
  • proof : Proof of Thm. \ref{['thm:convex']}
  • Theorem 2: Unimodality of Action-Value and Objective Functions
  • proof : Proof of Thm. \ref{['thm:unimodal']}
  • Lemma 3
  • proof : Proof of Lem. \ref{['le:infargmaxLSC']}
  • Lemma 3
  • proof
  • ...and 2 more