Table of Contents
Fetching ...

Model Monitoring: A General Framework with an Application to Non-life Insurance Pricing

Alexej Brauer, Paul Menzel, Mario V. Wüthrich

TL;DR

The paper tackles how non-life insurance pricing models degrade under concept drift in production data. It develops a monitoring framework that fuses a Gini-based risk-ranking test with Murphy-decomposition auto-calibration tests, underpinned by new asymptotic theory for the Gini score and bootstrap variance. The framework is model-agnostic and demonstrated on a modified motor-claim dataset with controlled drift, guiding refitting or recalibration decisions. Practical considerations and potential extensions for adaptive windowing, recurrent drift, and multi-method drift attribution are discussed to support deployment in real-world governance.

Abstract

Maintaining the predictive performance of pricing models is challenging when insurance portfolios and data-generating mechanisms evolve over time. Focusing on non-life insurance, we adopt the concept-drift terminology from machine learning and distinguish virtual drift from real concept drift in an actuarial setting. Methodologically, we (i) formalize deviance loss and Murphy's score decomposition to assess global and local auto-calibration; (ii) study the Gini score as a rank-based performance measure, derive its asymptotic distribution, and develop a consistent bootstrap estimator of its asymptotic variance; and (iii) combine these results into a statistically grounded, model-agnostic monitoring framework that integrates a Gini-based ranking drift test with global and local auto-calibration tests. An application to a modified motor insurance portfolio with controlled concept-drift scenarios illustrates how the framework guides decisions on refitting or recalibrating pricing models.

Model Monitoring: A General Framework with an Application to Non-life Insurance Pricing

TL;DR

The paper tackles how non-life insurance pricing models degrade under concept drift in production data. It develops a monitoring framework that fuses a Gini-based risk-ranking test with Murphy-decomposition auto-calibration tests, underpinned by new asymptotic theory for the Gini score and bootstrap variance. The framework is model-agnostic and demonstrated on a modified motor-claim dataset with controlled drift, guiding refitting or recalibration decisions. Practical considerations and potential extensions for adaptive windowing, recurrent drift, and multi-method drift attribution are discussed to support deployment in real-world governance.

Abstract

Maintaining the predictive performance of pricing models is challenging when insurance portfolios and data-generating mechanisms evolve over time. Focusing on non-life insurance, we adopt the concept-drift terminology from machine learning and distinguish virtual drift from real concept drift in an actuarial setting. Methodologically, we (i) formalize deviance loss and Murphy's score decomposition to assess global and local auto-calibration; (ii) study the Gini score as a rank-based performance measure, derive its asymptotic distribution, and develop a consistent bootstrap estimator of its asymptotic variance; and (iii) combine these results into a statistically grounded, model-agnostic monitoring framework that integrates a Gini-based ranking drift test with global and local auto-calibration tests. An application to a modified motor insurance portfolio with controlled concept-drift scenarios illustrates how the framework guides decisions on refitting or recalibrating pricing models.

Paper Structure

This paper contains 17 sections, 1 theorem, 24 equations, 9 figures, 3 tables, 4 algorithms.

Key Result

Theorem 1

Assume $(Y,\hat{\mu}) \sim F_{Y,\hat{\mu}}$ with finite first moments $\mathbb{E}[Y] < \infty$ and $\mathbb{E}[\hat{\mu}] < \infty$. Moreover, assume that the marginal distributions of $Y$ and $\hat{\mu}$ are continuous. Let $(Y_i,\hat{\mu}_i)$, $i \ge 1$, be i.i.d. copies of $(Y,\hat{\mu})$. Ther where $\widehat{G}_n(Y,\hat{\mu})$ is the empirical (finite sample) Gini score as defined in Defini

Figures (9)

  • Figure 1: Schematic representation of model monitoring (left) and model comparison (right).
  • Figure 2: Geometric visualization of the Gini score.
  • Figure 3: Histograms of the bootstrap Gini indices for varying numbers of bootstrap samples $B$ (left) and varying holdout sample sizes $n$ (right).
  • Figure 4: Example timeline of a model update cycle in 2024. The index $2024$ in $\hat{\mu}_{2024}$ indicates the year in which the model is developed.
  • Figure 5: Example timeline for decision-making in 2025. The index $2024$ in $\hat{\mu}_{2024}$ indicates the year in which the model is developed.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Definition 1: Auto-Calibration
  • Remark 1
  • Remark 2
  • Definition 2: Cumulative accuracy profile
  • Definition 3: Gini score
  • Theorem 1: Asymptotic Normality of the machine-learning Gini score
  • Definition 4: Empirical Gini score
  • Remark 3
  • proof : Proof of Theorem \ref{['thm:gini_asymptotic_normality']} (Asymptotic Normality of the machine-learning Gini score)
  • Example 2.1: Deviance loss for gamma EDF
  • ...and 1 more