ClauseLens: Clause-Grounded, CVaR-Constrained Reinforcement Learning for Trustworthy Reinsurance Pricing
Stella C. Dong, James R. Finlay
TL;DR
ClauseLens tackles the challenge of opacity in reinsurance pricing by embedding retrieved regulatory clauses into a clause-grounded, risk-aware reinforcement learning framework. By formulating quoting as a clause-enhanced, risk-constrained MDP and employing a dual-projected PPO with CVaR tail risk control and real-time clause-based action masking, the approach delivers regulation-compliant decisions with interpretable clause-grounded explanations. Empirical results in a calibrated treaty simulator show substantial improvements in tail-risk performance (around 27.9% for CVaR at 0.10) and solventy feasibility (approximately a 51% reduction in violations), while achieving high explainability and retrieval fidelity (88.2% entailment, 87.4% precision, 91.1% recall). The work demonstrates that integrating legal context into both decision-making and justification pathways can produce auditable, governance-aligned AI for high-stakes financial pricing, with broad implications for domains requiring regulatory compliance and transparency.
Abstract
Reinsurance treaty pricing must satisfy stringent regulatory standards, yet current quoting practices remain opaque and difficult to audit. We introduce ClauseLens, a clause-grounded reinforcement learning framework that produces transparent, regulation-compliant, and risk-aware treaty quotes. ClauseLens models the quoting task as a Risk-Aware Constrained Markov Decision Process (RA-CMDP). Statutory and policy clauses are retrieved from legal and underwriting corpora, embedded into the agent's observations, and used both to constrain feasible actions and to generate clause-grounded natural language justifications. Evaluated in a multi-agent treaty simulator calibrated to industry data, ClauseLens reduces solvency violations by 51%, improves tail-risk performance by 27.9% (CVaR_0.10), and achieves 88.2% accuracy in clause-grounded explanations with retrieval precision of 87.4% and recall of 91.1%. These findings demonstrate that embedding legal context into both decision and explanation pathways yields interpretable, auditable, and regulation-aligned quoting behavior consistent with Solvency II, NAIC RBC, and the EU AI Act.
