Detection of Interacting Variables for Generalized Linear Models via Neural Networks

Yevhen Havrylenko; Julia Heger

Detection of Interacting Variables for Generalized Linear Models via Neural Networks

Yevhen Havrylenko, Julia Heger

TL;DR

The work tackles automating the discovery of useful variable interactions to improve GLMs for insurance claim frequencies by introducing a Combined Actuarial Neural Network (CANN) that boosts a fixed benchmark GLM, followed by a model-specific Neural Interaction Detection (NID) to rank learned interactions, and mini-GLMs to identify the next-best interaction. The method is validated on an artificially generated dataset with known true interactions and on the freMTPL2freq open dataset, showing competitive performance and substantial speed advantages over traditional H-statistic approaches. It also demonstrates practical handling of high-cardinality categoricals via embedding layers and outlines a workflow suitable for large-scale MTPL data, where timely iterative refinement of GLMs is valuable for actuaries. Overall, the approach offers a data-driven, interpretable, and scalable path to progressively enhance GLMs without overhauling existing pricing structures.

Abstract

The quality of generalized linear models (GLMs), frequently used by insurance companies, depends on the choice of interacting variables. The search for interactions is time-consuming, especially for data sets with a large number of variables, depends much on expert judgement of actuaries, and often relies on visual performance indicators. Therefore, we present an approach to automating the process of finding interactions that should be added to GLMs to improve their predictive power. Our approach relies on neural networks and a model-specific interaction detection method, which is computationally faster than the traditionally used methods like Friedman H-Statistic or SHAP values. In numerical studies, we provide the results of our approach on artificially generated data as well as open-source data.

Detection of Interacting Variables for Generalized Linear Models via Neural Networks

TL;DR

Abstract

Paper Structure (19 sections, 14 equations, 6 figures, 9 tables)

This paper contains 19 sections, 14 equations, 6 figures, 9 tables.

Introduction
Generalized linear models for modeling insurance claim frequencies
Algorithmic detection of the strongest interaction missing in a GLM
Outperforming the benchmark GLM via CANN
Opening the black box: ranking learned interactions
Identification of the next-best interaction for a GLM
Case studies
Artificial data set
Step 1: Training CANN
Step 2: Ranking of learned interactions via neural interaction detection
Step 3: Recommendation of the next-best interaction
Open-source data set freMTPL2freq
Step 1: Training CANN
Step 2: Ranking of learned interactions
Step 3: Recommendation of the next-best interaction
...and 4 more sections

Figures (6)

Figure 1: Architecture of a CANN model
Figure 2: Example of the NN part of a CANN model that uses a $2$-dimensional embedding layer (in light blue) encoding a categorical feature $\tilde{x}_{\cdot, 7}$.
Figure 3: Lift plots.
Figure 4: Generation of interactions in the first hidden layer and propagation of these interactions through the network. Figure adapted from tsang2017detecting.
Figure 5: Illustration of the interaction-strength calculation: evaluate all neurons of the first hidden layer by measuring the in-going and outgoing paths and then aggregate the results.
...and 1 more figures

Detection of Interacting Variables for Generalized Linear Models via Neural Networks

TL;DR

Abstract

Detection of Interacting Variables for Generalized Linear Models via Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)