Conformal prediction for frequency-severity modeling

Helton Graziadei; Paulo C. Marques F.; Eduardo F. L. de Melo; Rodrigo S. Targino

Conformal prediction for frequency-severity modeling

Helton Graziadei, Paulo C. Marques F., Eduardo F. L. de Melo, Rodrigo S. Targino

TL;DR

This paper develops a model-agnostic conformal prediction framework to construct finite-sample prediction intervals for two-stage frequency-severity insurance models, applicable to both parametric and machine-learning severity predictors. It extends split conformal prediction to jointly handle frequency and severity with a residual-based conformity scheme, guaranteeing coverage at $1-\alpha$ up to a tunable finite-sample bound, and introduces a two-stage out-of-bag extension using random forests to avoid calibration data altogether. Through synthetic data and real insurance datasets (MTPL Belgium and Brazilian crop insurance), the approach demonstrates comparable coverage with substantially narrower intervals when using random forests for severity, and further gains when employing the out-of-bag variant. The work concludes that conformal prediction provides reliable uncertainty quantification for frequency-severity modeling, with practical impact for risk pricing and reserving, and offers open-source software to reproduce the results.

Abstract

We present a model-agnostic framework for the construction of prediction intervals of insurance claims, with finite sample statistical guarantees, extending the technique of split conformal prediction to the domain of two-stage frequency-severity modeling. The framework effectiveness is showcased with simulated and real datasets using classical parametric models and contemporary machine learning methods. When the underlying severity model is a random forest, we extend the two-stage split conformal prediction algorithm, showing how the out-of-bag mechanism can be leveraged to eliminate the need for a calibration set in the conformal procedure.

Conformal prediction for frequency-severity modeling

TL;DR

up to a tunable finite-sample bound, and introduces a two-stage out-of-bag extension using random forests to avoid calibration data altogether. Through synthetic data and real insurance datasets (MTPL Belgium and Brazilian crop insurance), the approach demonstrates comparable coverage with substantially narrower intervals when using random forests for severity, and further gains when employing the out-of-bag variant. The work concludes that conformal prediction provides reliable uncertainty quantification for frequency-severity modeling, with practical impact for risk pricing and reserving, and offers open-source software to reproduce the results.

Abstract

Paper Structure (14 sections, 7 equations, 11 figures, 6 tables, 2 algorithms)

This paper contains 14 sections, 7 equations, 11 figures, 6 tables, 2 algorithms.

Introduction
Frequency-severity modeling
Two-stage split conformal prediction
The classical procedure
Two-stage setting
Synthetic and real datasets
Synthetic data
Motor third party liability in Belgium
Crop insurance in Brazil
Out-of-bag extension
Random forests and the out-of-bag mechanism
Two-stage out-of-bag conformal prediction
Out-of-bag performance
Concluding remarks

Figures (11)

Figure 1: An illustration of the split conformal procedure for regression with a single predictor. We start with fourteen data points, which are randomly split into the nine gray and five black points in the figure, representing the training and calibration samples, respectively. The gray line $\hat{\mu}$ is a predictive model fit to the training sample using a nonparametric method. The lengths of the black segments are the values of the calibration conformity scores. The nominal miscoverage level $\alpha=40\%$, so that $\hat{r}=R_{(\lceil(1-0.4)((5+1)\rceil)}=R_{(4)}$. A future predictor $x^*$ and the corresponding conformal prediction interval are depicted in the figure.
Figure 2: Frequency and severity distributions for the 10,000 sample units in the synthetic dataset.
Figure 3: Prediction intervals produced by Algorithm \ref{['algo:tsscp']} for fifty test sample units in the synthetic dataset, using a nominal miscoverage level $\alpha=10\%$. The black dots are the observed severity values. On the left and right figures, we have the results using gamma regressions and random forests, respectively, for the severity stage.
Figure 4: RMSE distributions based on 100 replications of the random forest models with 10, 100, and 1,000 trees.
Figure 5: Frequency and severity distributions for the 163,212 policies in the motor third party liability dataset.
...and 6 more figures

Conformal prediction for frequency-severity modeling

TL;DR

Abstract

Conformal prediction for frequency-severity modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (11)