Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

Yanxi Hou; Xiaolan Xia; Guangyuan Gao

Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

Yanxi Hou, Xiaolan Xia, Guangyuan Gao

TL;DR

This paper introduces a novel approach by developing a joint mixture model that integrates both claim descriptions and claim amounts, establishing a probabilistic link between textual descriptions and loss amounts, enhancing the accuracy of claims clustering and prediction.

Abstract

Modeling insurance claim amounts and classifying claims into different risk levels are critical yet challenging tasks. Traditional predictive models for insurance claims often overlook the valuable information embedded in claim descriptions. This paper introduces a novel approach by developing a joint mixture model that integrates both claim descriptions and claim amounts. Our method establishes a probabilistic link between textual descriptions and loss amounts, enhancing the accuracy of claims clustering and prediction. In our proposed model, the latent topic/component indicator serves as a proxy for both the thematic content of the claim description and the component of loss distributions. Specifically, conditioned on the topic/component indicator, the claim description follows a multinomial distribution, while the claim amount follows a component loss distribution. We propose two methods for model calibration: an EM algorithm for maximum a posteriori estimates, and an MH-within-Gibbs sampler algorithm for the posterior distribution. The empirical study demonstrates that the proposed methods work effectively, providing interpretable claims clustering and prediction.

Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

TL;DR

Abstract

Paper Structure (17 sections, 35 equations, 9 figures, 13 tables, 3 algorithms)

This paper contains 17 sections, 35 equations, 9 figures, 13 tables, 3 algorithms.

Introduction
Loss Dirichlet multinomial mixture model
Relevant distributions in the LDMM model
The EM algorithm for the MAP estimate
The Gibbs sampler for the posterior distribution
Model selection
Posterior predictive distribution and risk measures
Experimental study
Data description
Parameter estimation
MAP estimates
Posterior distribution of parameters
Prediction on the test RBNS claims
Component analysis
Conclusions
...and 2 more sections

Figures (9)

Figure 1: Graphical illustration of the LDMM model
Figure 2: The distribution of the claims amount.
Figure 3: Histogram of word counts of the claim description.
Figure 4: Word clouds using the TF (left) and TF-IDF (right).
Figure 5: The convergence of Algorithm \ref{['alg:em']} for Models 2LN and 2GB2.
...and 4 more figures

Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

TL;DR

Abstract

Combining Structural and Unstructured Data: A Topic-based Finite Mixture Model for Insurance Claim Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (9)