Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection
Félix Vandervorst, Bruno Deprez, Wouter Verbeke, Tim Verdonck
TL;DR
This work tackles insurance fraud detection on heterogeneous and dynamic graphs by introducing G-GBM, an inductive gradient-boosted tree framework that leverages probability-weighted metapaths over ego-nets. By combining heterogeneous information networks with path-based feature representations and weighted splits, G-GBM achieves competitive or superior performance compared with GraphSAGE and HinSage across simulated and real-world datasets, while enabling explainability via SHAP analyses. The approach preserves the strengths of tree-based methods (handling of categorical features, missing values, and interpretability) and extends them to graph contexts without iterative neighborhood aggregation. Empirically, G-GBM demonstrates Pareto-dominant performance and practical utility for fraud detection in evolving networks, with open-source code provided to support reproducibility and extension.
Abstract
Graph-based methods are becoming increasingly popular in machine learning due to their ability to model complex data and relations. Insurance fraud is a prime use case, since false claims are often the result of organised criminals that stage accidents or the same persons filing erroneous claims on multiple policies. One challenge is that graph-based approaches struggle to find meaningful representations of the data because of the high class imbalance present in fraud data. Another is that insurance networks are heterogeneous and dynamic, given the changing relations among people, companies and policies. That is why gradient boosted tree approaches on tabular data still dominate the field. Therefore, we present a novel inductive graph gradient boosting machine (G-GBM) for supervised learning on heterogeneous and dynamic graphs. We show that our estimator competes with popular graph neural network approaches in an experiment using a variety of simulated random graphs. We demonstrate the power of G-GBM for insurance fraud detection using an open-source and a real-world, proprietary dataset. Given that the backbone model is a gradient boosting forest, we apply established explainability methods to gain better insights into the predictions made by G-GBM.
