Distributional Refinement Network: Distributional Forecasting via Deep Learning

Benjamin Avanzi; Eric Dong; Patrick J. Laub; Bernard Wong

Distributional Refinement Network: Distributional Forecasting via Deep Learning

Benjamin Avanzi, Eric Dong, Patrick J. Laub, Bernard Wong

TL;DR

The paper tackles forecasting the full conditional distribution of losses in actuarial contexts by introducing the Distributional Refinement Network (DRN), which refines a transparent baseline model (such as a GLM) with a flexible neural component inspired by DDR. This architecture preserves interpretability through the baseline while achieving enhanced distributional flexibility via a partitioned density refinement that adjusts baseline masses on carefully chosen intervals. Training hinges on a JBCE-based objective, augmented with KL and roughness regularisers to maintain stability and fidelity to the baseline, and optional mean regularisation to respect baseline mean predictions. Across synthetic and real datasets, DRN demonstrates improved distributional forecasting (e.g., lower CRPS and NLL, better calibration) and yields interpretable insights via Kernel SHAP decompositions, enabling local and global understanding of the refinement. The approach offers a practical pathway to leverage deep learning for distributional actuarial forecasting without sacrificing core interpretability, with potential applicability beyond insurance analytics.

Abstract

A key task in actuarial modelling involves modelling the distributional properties of losses. Classic (distributional) regression approaches like Generalized Linear Models (GLMs; Nelder and Wedderburn, 1972) are commonly used, but challenges remain in developing models that can (i) allow covariates to flexibly impact different aspects of the conditional distribution, (ii) integrate developments in machine learning and AI to maximise the predictive power while considering (i), and, (iii) maintain a level of interpretability in the model to enhance trust in the model and its outputs, which is often compromised in efforts pursuing (i) and (ii). We tackle this problem by proposing a Distributional Refinement Network (DRN), which combines an inherently interpretable baseline model (such as GLMs) with a flexible neural network-a modified Deep Distribution Regression (DDR; Li et al., 2019) method. Inspired by the Combined Actuarial Neural Network (CANN; Schelldorfer and W{\''u}thrich, 2019), our approach flexibly refines the entire baseline distribution. As a result, the DRN captures varying effects of features across all quantiles, improving predictive performance while maintaining adequate interpretability. Using both synthetic and real-world data, we demonstrate the DRN's superior distributional forecasting capacity. The DRN has the potential to be a powerful distributional regression model in actuarial science and beyond.

Distributional Refinement Network: Distributional Forecasting via Deep Learning

TL;DR

Abstract

Paper Structure (39 sections, 33 equations, 18 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 33 equations, 18 figures, 6 tables, 1 algorithm.

Introduction
General Setting and Notation
Existing Distributional regression techniques
Parametric Models
Generalised Linear Model
Generalised Additive Models for Location, Scale, and Shape
Distributional Neural Network
Semiparametric Models
(Deep) Quantile Regression
(Deep) Distribution Regression
Distributional Refinement Network
Architecture Design
Baseline Model Integration
Output Generation
Distributional Forecasting
...and 24 more sections

Figures (18)

Figure 1: A schematic view of our proposed deep learning framework designed for distributional forecasting. The Distributional Refinement Network (DRN) inputs a feature set and a density estimate provided by a parametric baseline model, such as a GLM. The DRN then outputs a refined version of the initial density estimate.
Figure 2: Demonstration of the DDR model proposed by li2021deep.
Figure 3: The schematic illustrates the architecture of the Distributional Refinement Network (DRN). Notably, the baseline model depicted in the bottom left is not directly fed into the network. Instead, the DRN employs the summarised baseline information, $\boldsymbol{b}(\boldsymbol{x};\boldsymbol{\beta}) = (\hat{b}_1(\boldsymbol{x}),\ldots, \hat{b}_K(\boldsymbol{x}))^{\top}$, which comprises the predicted baseline probability masses as outlined in Equation \ref{['eq: baseline prob masses']}. Represented by the brown neurons, this information is introduced into the DRN as a skip-connection, combined with the "raw" outputs, $\boldsymbol{l}(\boldsymbol{x};\boldsymbol{w}) = (\hat{l}_1(\boldsymbol{x}),\ldots, \hat{l}_K(\boldsymbol{x}))^{\top}$. These "raw" outputs are obtained from the feature inputs $\boldsymbol{x} = (x_1,\ldots,x_{p})^{\top}$ (top left) after propagation through the hidden layers, depicted by the blue neurons. The red arrows in the figure signify the process of computing adjustment factors $\boldsymbol{a}(\boldsymbol{x};\boldsymbol{w},\boldsymbol{\beta}) = (\hat{a}_1(\boldsymbol{x}),\ldots,\hat{a}_K(\boldsymbol{x}))^{\top}$. They reflect the specific transformation and integration steps within the DRN, defined through Equation \ref{['eq: adjustment factor']}. This process remains unchanged during the backpropagation of weights. The trainable weights within the DRN are denoted by blue lines, while the grey arrows denote a series of transformations elaborated in Section \ref{['Incorporating Baseline']}. The network employs LeakyReLU for hidden layers to prevent inactive neuron issues and Linear for the output layer. The detailed explanations of these choices are provided in \ref{['appendix: activation function choices']}.
Figure 4: The figure showcases the adjustment factors $\hat{a}_k$'s applied to the density of $Y|\boldsymbol{X}=\boldsymbol{x}$, with dashed vertical lines marking the cutpoints, $c_k$'s. For illustration, five cutpoints $c_0<c_1<c_2<c_3<c_4$ form four intervals that require density modifications. The black line denotes the baseline model's density, while the blue line indicates the refined density as estimated by the DRN.
Figure 5: The conditional densities generated by the DRN are shown with and without KL regularisation applied. The left graph shows the unregularised DRN density estimate ($\alpha_1 = 0$) and the right graph shows KL regularisation applied ($\alpha_1 = 0.002$). The DRN density function (blue) is compared against the baseline GLM (black) density and the true density (red) for the specific observation $\boldsymbol{X}=(0.5, 0.5)$.
...and 13 more figures

Theorems & Definitions (2)

Remark 4.1
Definition 1: Probabilistically Calibrated - gneiting2007probabilistic

Distributional Refinement Network: Distributional Forecasting via Deep Learning

TL;DR

Abstract

Distributional Refinement Network: Distributional Forecasting via Deep Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (18)

Theorems & Definitions (2)