A tree-based varying coefficient model

Henning Zakrisson; Mathias Lindholm

A tree-based varying coefficient model

Henning Zakrisson, Mathias Lindholm

TL;DR

The paper tackles interpretability in flexible mean modeling by introducing a tree-based varying-coefficient model (TVCM) that learns coefficient functions $β_j(z)$ with a Cyclic Gradient Boosting Machine (CGBM). This framework enables dimension-wise early stopping and per-coefficient feature importance, yielding easy interpretability and automatic feature selection while maintaining predictive performance comparable to neural-network VCMs like LocalGLMnet. Empirical results on both simulated data and a large real insurance dataset (FreMTPLfreq2) show that TVCM outperforms or matches GLM and GBM baselines and provides meaningful insights into interactions via coefficient-function FI. The approach offers a scalable, interpretable alternative to black-box VCMs with practical utility in actuarial and related domains, and code is available for implementation and extension.

Abstract

The paper introduces a tree-based varying coefficient model (VCM) where the varying coefficients are modelled using the cyclic gradient boosting machine (CGBM) from Delong et al. (2023). Modelling the coefficient functions using a CGBM allows for dimension-wise early stopping and feature importance scores. The dimension-wise early stopping not only reduces the risk of dimension-specific overfitting, but also reveals differences in model complexity across dimensions. The use of feature importance scores allows for simple feature selection and easy model interpretation. The model is evaluated on the same simulated and real data examples as those used in Richman and Wüthrich (2023), and the results show that it produces results in terms of out of sample loss that are comparable to those of their neural network-based VCM called LocalGLMnet.

A tree-based varying coefficient model

TL;DR

The paper tackles interpretability in flexible mean modeling by introducing a tree-based varying-coefficient model (TVCM) that learns coefficient functions

with a Cyclic Gradient Boosting Machine (CGBM). This framework enables dimension-wise early stopping and per-coefficient feature importance, yielding easy interpretability and automatic feature selection while maintaining predictive performance comparable to neural-network VCMs like LocalGLMnet. Empirical results on both simulated data and a large real insurance dataset (FreMTPLfreq2) show that TVCM outperforms or matches GLM and GBM baselines and provides meaningful insights into interactions via coefficient-function FI. The approach offers a scalable, interpretable alternative to black-box VCMs with practical utility in actuarial and related domains, and code is available for implementation and extension.

Abstract

Paper Structure (19 sections, 35 equations, 6 figures, 5 tables, 2 algorithms)

This paper contains 19 sections, 35 equations, 6 figures, 5 tables, 2 algorithms.

Introduction
Model architecture
Generalised linear models
VCMs and local GLMs
The Cyclic Gradient Boosting Machine
A cyclically boosted VCM
Comments on convergence
Modeling considerations
Feature selection
Interaction effects
Categorical features
Examples
Simulated data
Real data
Conclusion
...and 4 more sections

Figures (6)

Figure 1: Estimated means $\widehat{\mu}(\bm{x})$ vs true regression function $\mu(\bm{x})$ for $500$ randomly selected observations from the test set for the GLM and TVCMs respectively.
Figure 2: Feature importance scores for the simulated data example. Cell $(x_j,\widehat{\beta}_k)$ corresponds to the feature importance score of feature $x_j$ for the coefficient function $\widehat{\beta}_k$, i.e. the relative loss reduction from splits in the trees of coefficient function estimate $\widehat{\beta}_k$ that used feature $x_j$. All rows are normalised to sum to one. Note that $\widehat{\beta}_0$ is the intercept and does not depend on any features, and that $\kappa_1 = 0$ means that coefficient function estimate $\widehat{\beta}_1$ consists of the GLM initiation only.
Figure 3: Coefficient function estimates $\widehat{\beta}_j(\bm{x})$ for the TVCM estimates on the simulated data set for different values of $x_l$, where $l$ is the effect modifier with the highest feature importance score for each coefficient function. The GLM initiation $\widehat{\beta}_j^{(0)}$ is also shown, as well as the true regression definitions from Table \ref{['tab:simulated_true_regression_attentions']}.
Figure 4: Out of sample predictions of $\mu(\bm{x})$ on the real data set, where $\mu(\bm{x}) = \mathbb{E}[Y_i \mid \bm{X} = \bm{x}_i, W = w_i]/w_i$. The predictions have been ordered according to the $\mu(\bm{x})$-predictions from the TVCM and averaged using a rolling mean of $1\,000$. For the true observations $y$, the rolling mean is taken over the true number of claims and divided by the rolling mean of the duration of exposure.
Figure 5: Feature importance scores for the real data example. Cell $(j,\widehat{\beta}_k)$ corresponds to the feature importance score of feature $j$ for the coefficient function estimate $\widehat{\beta}_k$, i.e. the relative loss reduction from splits in the trees of coefficient function estimate $\widehat{\beta}_k$ that used feature $j$. All rows are normalised to sum to one. For the categorical features (VehGas, VehBrand, and Region), the feature importance scores have been summed over the different categories before normalisation.
...and 1 more figures

Theorems & Definitions (3)

Remark 1
Remark 2
Remark 3

A tree-based varying coefficient model

TL;DR

Abstract

A tree-based varying coefficient model

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (3)