Smoothness and monotonicity constraints for neural networks using ICEnet

Ronald Richman; Mario Wüthrich

Smoothness and monotonicity constraints for neural networks using ICEnet

Ronald Richman, Mario Wüthrich

TL;DR

The paper tackles the lack of principled smoothness and monotonicity constraints in neural networks for actuarial tasks by introducing ICEnet, which augments data with pseudo-data reflecting constrained variables and trains a single FCN to produce both standard predictions and ICE-based constraint outputs. The loss combines the predictive deviance $L^D$ with a smoothing term $L_2$ driven by third differences and a monotonicity term $L_3$ driven by first differences, implemented via a shared, time-distributed network. Empirical results on a French MTPL dataset show that monotonicity constraints can improve out-of-sample performance and yield more commercially interpretable ICE/PDP curves, while smoothing constraints may trade a bit of accuracy for stronger behavioral guarantees. A Local ICEnet variant further reduces computation while preserving much of the constraint-related benefits, making the approach practical for real-world deployment in actuarial pricing tasks.

Abstract

Deep neural networks have become an important tool for use in actuarial tasks, due to the significant gains in accuracy provided by these techniques compared to traditional methods, but also due to the close connection of these models to the Generalized Linear Models (GLMs) currently used in industry. Whereas constraining GLM parameters relating to insurance risk factors to be smooth or exhibit monotonicity is trivial, methods to incorporate such constraints into deep neural networks have not yet been developed. This is a barrier for the adoption of neural networks in insurance practice since actuaries often impose these constraints for commercial or statistical reasons. In this work, we present a novel method for enforcing constraints within deep neural network models, and we show how these models can be trained. Moreover, we provide example applications using real-world datasets. We call our proposed method ICEnet to emphasize the close link of our proposal to the individual conditional expectation (ICE) model interpretability technique.

Smoothness and monotonicity constraints for neural networks using ICEnet

TL;DR

with a smoothing term

driven by third differences and a monotonicity term

driven by first differences, implemented via a shared, time-distributed network. Empirical results on a French MTPL dataset show that monotonicity constraints can improve out-of-sample performance and yield more commercially interpretable ICE/PDP curves, while smoothing constraints may trade a bit of accuracy for stronger behavioral guarantees. A Local ICEnet variant further reduces computation while preserving much of the constraint-related benefits, making the approach practical for real-world deployment in actuarial pricing tasks.

Abstract

Paper Structure (19 sections, 15 equations, 11 figures)

This paper contains 19 sections, 15 equations, 11 figures.

Introduction
Neural networks and Individual Conditional Expectations
Supervised learning and neural networks
Pre-processing covariates for FCNs
Categorical covariates
Numerical covariates
Individual Conditional Expectations and Partial Dependence Plots
ICEnet
Description of the ICEnet
Definition of the ICEnet
Applying the ICEnet
Introduction and exploratory analysis
Fitting the ICEnet
Exploring the ICEnet predictions
Varying the ICEnet constraints
...and 4 more sections

Figures (11)

Figure 1: Diagram explaining the ICEnet. The same neural network $\Psi_W$ is used to produce both the predictions from the model, as well as to create predictions based on pseudo-data. These latter predictions are constrained, ensuring that the outputs of the ICEnet vary smoothly or monotonically with changes in the input variables $\boldsymbol{x}$. In this graph, we are varying variable $x_1$ to produce the ICEnet outputs which are $\Psi_W(\Tilde{\boldsymbol{x}}^{[1]}(\cdot))$.
Figure 2: Empirical claims frequency (top panel) and observed exposures (bottom panel) in the French MTPL dataset for each of the Bonus-Malus Level, Density, Driver Age, Vehicle Age and Vehicle Power covariates (univariate analysis only), learning set only. Note that the $y$-scales for each variable are not comparable.
Figure 3: PDPs for each of the Bonus-Malus Level, Density, Driver Age, Vehicle Age and Vehicle Power fields shown in separate panels, test set only. Blue lines are PDPs from the FCNs (unsmoothed) and red lines are PDPs from the ICEnet (smoothed). Bold lines relate to the PDPs from the first of 10 runs; the lighter lines relate to the remaining runs. Note that the scale of the $y$-axis varies between each panel.
Figure 4: Density plots of the difference between the monotonicity and smoothness components of the ICEnet loss function \ref{['ICEnet_loss']} evaluated for each observation in the test set.
Figure 5: ICE plots of the output of the FCN and the ICEnet for instances $n$ chosen to be the least monotonic based on the monotonicity score evaluated for each instance in the test set on the outputs of the FCN. Note that the smoothed model is the ICEnet and unsmoothed model is the FCN.
...and 6 more figures

Theorems & Definitions (1)

Remark 3.2

Smoothness and monotonicity constraints for neural networks using ICEnet

TL;DR

Abstract

Smoothness and monotonicity constraints for neural networks using ICEnet

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (1)