Table of Contents
Fetching ...

Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes

Stella C. Dong, James R. Finlay

TL;DR

The paper tackles adaptive insurance reserving under uncertainty by formulating it as a CVaR-constrained reinforcement learning problem with regime-aware curriculum learning. It proposes a PPO-based agent that optimizes reserve decisions within a finite-horizon MDP, enforcing tail-risk control and solvency constraints through a CVaR-penalized reward and volatility-adjusted capital floors. The framework is validated on two CAS Loss Reserving datasets, showing improvements in tail risk (CVaR$_{0.95}$), capital efficiency, and regulatory violation rates, with robustness demonstrated under both stochastic regimes and fixed-shock stress tests. This approach offers a principled, extensible path toward solvency-aware, data-driven reserving that aligns with Solvency II and ORSA requirements and supports regime-stratified decision-making under economic uncertainty.

Abstract

This paper proposes a reinforcement learning (RL) framework for insurance reserving that integrates tail-risk sensitivity, macroeconomic regime modeling, and regulatory compliance. The reserving problem is formulated as a finite-horizon Markov Decision Process (MDP), in which reserve adjustments are optimized using Proximal Policy Optimization (PPO) subject to Conditional Value-at-Risk (CVaR) constraints. To enhance policy robustness across varying economic conditions, the agent is trained using a regime-aware curriculum that progressively increases volatility exposure. The reward structure penalizes reserve shortfall, capital inefficiency, and solvency floor violations, with design elements informed by Solvency II and Own Risk and Solvency Assessment (ORSA) frameworks. Empirical evaluations on two industry datasets--Workers' Compensation, and Other Liability--demonstrate that the RL-CVaR agent achieves superior performance relative to classical reserving methods across multiple criteria, including tail-risk control (CVaR$_{0.95}$), capital efficiency, and regulatory violation rate. The framework also accommodates fixed-shock stress testing and regime-stratified analysis, providing a principled and extensible approach to reserving under uncertainty.

Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes

TL;DR

The paper tackles adaptive insurance reserving under uncertainty by formulating it as a CVaR-constrained reinforcement learning problem with regime-aware curriculum learning. It proposes a PPO-based agent that optimizes reserve decisions within a finite-horizon MDP, enforcing tail-risk control and solvency constraints through a CVaR-penalized reward and volatility-adjusted capital floors. The framework is validated on two CAS Loss Reserving datasets, showing improvements in tail risk (CVaR), capital efficiency, and regulatory violation rates, with robustness demonstrated under both stochastic regimes and fixed-shock stress tests. This approach offers a principled, extensible path toward solvency-aware, data-driven reserving that aligns with Solvency II and ORSA requirements and supports regime-stratified decision-making under economic uncertainty.

Abstract

This paper proposes a reinforcement learning (RL) framework for insurance reserving that integrates tail-risk sensitivity, macroeconomic regime modeling, and regulatory compliance. The reserving problem is formulated as a finite-horizon Markov Decision Process (MDP), in which reserve adjustments are optimized using Proximal Policy Optimization (PPO) subject to Conditional Value-at-Risk (CVaR) constraints. To enhance policy robustness across varying economic conditions, the agent is trained using a regime-aware curriculum that progressively increases volatility exposure. The reward structure penalizes reserve shortfall, capital inefficiency, and solvency floor violations, with design elements informed by Solvency II and Own Risk and Solvency Assessment (ORSA) frameworks. Empirical evaluations on two industry datasets--Workers' Compensation, and Other Liability--demonstrate that the RL-CVaR agent achieves superior performance relative to classical reserving methods across multiple criteria, including tail-risk control (CVaR), capital efficiency, and regulatory violation rate. The framework also accommodates fixed-shock stress testing and regime-stratified analysis, providing a principled and extensible approach to reserving under uncertainty.

Paper Structure

This paper contains 33 sections, 11 equations, 2 figures, 8 tables, 1 algorithm.

Figures (2)

  • Figure 1: Conceptual architecture of the RL-CVaR reserving framework. The agent receives inputs from claims data, macroeconomic regimes, and regulatory constraints, and learns volatility-aware reserve decisions through CVaR-penalized reinforcement learning.
  • Figure 2: Training workflow of the RL-CVaR framework. Macroeconomic shock sampling and claim data define the environment for each regime. The PPO agent learns a CVaR-penalized reserving policy, which is evaluated through regime-specific stress testing.