Intertemporal Hedging Demand under Epstein-Zin Preferences in a Multi-Asset Long-Run Risk Model: Evidence from Projected Pontryagin-Guided Deep Policy Optimization
Wonchan Cho
TL;DR
The paper tackles intertemporal hedging in a high-dimensional, continuous-time portfolio problem with Epstein--Zin recursive preferences and a persistent long-run risk factor. It develops a projected Pontryagin-guided deep policy optimization (P-PGDPO) method that represents the value and costate processes with neural networks and updates the policy along the Hamiltonian gradient while enforcing feasibility via explicit projections. Empirically, the EZ-informed policy exhibits strong state-dependent hedging, concentrates exposure in assets tied to the LRR factor, and imposes wealth-floor constraints that substantially damp hedging near the boundary; CRRA serves as a stable diagnostic benchmark. Overall, the approach demonstrates that a transparent, PMP-based deep learning framework can yield economically interpretable hedging patterns in realistic multi-asset, long-run risk settings, with wealth constraints shaping the feasible hedging space.
Abstract
I study intertemporal hedging demand in a continuous-time multi-asset long-run risk (LRR) model under Epstein--Zin (EZ) recursive preferences. The investor trades a risk-free asset and several risky assets whose drifts and volatilities depend on an Ornstein--Uhlenbeck type LRR factor. Preferences are described by EZ utility with risk aversion $R$, elasticity of intertemporal substitution $ψ$, and discount rate $δ$, so that the standard time-additive CRRA case appears as a limiting benchmark. To handle the high-dimensional consumption--investment problem, I use a projected Pontryagin-guided deep policy optimization (P-PGDPO) scheme adapted to EZ preferences. The method starts from the continuous-time Hamiltonian implied by the Pontryagin maximum principle, represents the value and costate processes with neural networks, and updates the policy along the Hamiltonian gradient. Portfolio constraints and a lower bound on wealth are enforced by explicit projection operators rather than by adding ad hoc penalties. Three main findings emerge from numerical experiments in a five-asset LRR economy: \textbf{(1)} the P-PGDPO algorithm achieves stable convergence across multiple random seeds, validating its reliability for solving high-dimensional EZ problems; \textbf{(2)} wealth floors materially reduce hedging demand by limiting the investor's ability to exploit intertemporal risk-return tradeoffs; and \textbf{(3)} the learned hedging portfolios concentrate exposure in assets with high correlation to the LRR factor, confirming that EZ agents actively hedge long-run uncertainty rather than merely following myopic rules. Because EZ preferences nest time-additive CRRA in the limit $ψ\to 1/R$, I use CRRA as an explicit diagnostic benchmark and, when needed, a warm start to stabilize training in high dimensions.
