Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes
Stella C. Dong, James R. Finlay
TL;DR
The paper tackles adaptive insurance reserving under uncertainty by formulating it as a CVaR-constrained reinforcement learning problem with regime-aware curriculum learning. It proposes a PPO-based agent that optimizes reserve decisions within a finite-horizon MDP, enforcing tail-risk control and solvency constraints through a CVaR-penalized reward and volatility-adjusted capital floors. The framework is validated on two CAS Loss Reserving datasets, showing improvements in tail risk (CVaR$_{0.95}$), capital efficiency, and regulatory violation rates, with robustness demonstrated under both stochastic regimes and fixed-shock stress tests. This approach offers a principled, extensible path toward solvency-aware, data-driven reserving that aligns with Solvency II and ORSA requirements and supports regime-stratified decision-making under economic uncertainty.
Abstract
This paper proposes a reinforcement learning (RL) framework for insurance reserving that integrates tail-risk sensitivity, macroeconomic regime modeling, and regulatory compliance. The reserving problem is formulated as a finite-horizon Markov Decision Process (MDP), in which reserve adjustments are optimized using Proximal Policy Optimization (PPO) subject to Conditional Value-at-Risk (CVaR) constraints. To enhance policy robustness across varying economic conditions, the agent is trained using a regime-aware curriculum that progressively increases volatility exposure. The reward structure penalizes reserve shortfall, capital inefficiency, and solvency floor violations, with design elements informed by Solvency II and Own Risk and Solvency Assessment (ORSA) frameworks. Empirical evaluations on two industry datasets--Workers' Compensation, and Other Liability--demonstrate that the RL-CVaR agent achieves superior performance relative to classical reserving methods across multiple criteria, including tail-risk control (CVaR$_{0.95}$), capital efficiency, and regulatory violation rate. The framework also accommodates fixed-shock stress testing and regime-stratified analysis, providing a principled and extensible approach to reserving under uncertainty.
