Detection of Interacting Variables for Generalized Linear Models via Neural Networks
Yevhen Havrylenko, Julia Heger
TL;DR
The work tackles automating the discovery of useful variable interactions to improve GLMs for insurance claim frequencies by introducing a Combined Actuarial Neural Network (CANN) that boosts a fixed benchmark GLM, followed by a model-specific Neural Interaction Detection (NID) to rank learned interactions, and mini-GLMs to identify the next-best interaction. The method is validated on an artificially generated dataset with known true interactions and on the freMTPL2freq open dataset, showing competitive performance and substantial speed advantages over traditional H-statistic approaches. It also demonstrates practical handling of high-cardinality categoricals via embedding layers and outlines a workflow suitable for large-scale MTPL data, where timely iterative refinement of GLMs is valuable for actuaries. Overall, the approach offers a data-driven, interpretable, and scalable path to progressively enhance GLMs without overhauling existing pricing structures.
Abstract
The quality of generalized linear models (GLMs), frequently used by insurance companies, depends on the choice of interacting variables. The search for interactions is time-consuming, especially for data sets with a large number of variables, depends much on expert judgement of actuaries, and often relies on visual performance indicators. Therefore, we present an approach to automating the process of finding interactions that should be added to GLMs to improve their predictive power. Our approach relies on neural networks and a model-specific interaction detection method, which is computationally faster than the traditionally used methods like Friedman H-Statistic or SHAP values. In numerical studies, we provide the results of our approach on artificially generated data as well as open-source data.
