A tree-based varying coefficient model
Henning Zakrisson, Mathias Lindholm
TL;DR
The paper tackles interpretability in flexible mean modeling by introducing a tree-based varying-coefficient model (TVCM) that learns coefficient functions $β_j(z)$ with a Cyclic Gradient Boosting Machine (CGBM). This framework enables dimension-wise early stopping and per-coefficient feature importance, yielding easy interpretability and automatic feature selection while maintaining predictive performance comparable to neural-network VCMs like LocalGLMnet. Empirical results on both simulated data and a large real insurance dataset (FreMTPLfreq2) show that TVCM outperforms or matches GLM and GBM baselines and provides meaningful insights into interactions via coefficient-function FI. The approach offers a scalable, interpretable alternative to black-box VCMs with practical utility in actuarial and related domains, and code is available for implementation and extension.
Abstract
The paper introduces a tree-based varying coefficient model (VCM) where the varying coefficients are modelled using the cyclic gradient boosting machine (CGBM) from Delong et al. (2023). Modelling the coefficient functions using a CGBM allows for dimension-wise early stopping and feature importance scores. The dimension-wise early stopping not only reduces the risk of dimension-specific overfitting, but also reveals differences in model complexity across dimensions. The use of feature importance scores allows for simple feature selection and easy model interpretation. The model is evaluated on the same simulated and real data examples as those used in Richman and Wüthrich (2023), and the results show that it produces results in terms of out of sample loss that are comparable to those of their neural network-based VCM called LocalGLMnet.
