Table of Contents
Fetching ...

A Bayesian Approach for Prioritising Driving Behaviour Investigations in Telematic Auto Insurance Policies

Mark McLeod, Bernardo Perez-Orozco, Nika Lee, Davide Zilli

TL;DR

The paper tackles identifying delivery driving within telematics data to prevent underinsurance and misrepresentation. It introduces a two-stage approach: a per-trip GPS-path classifier using engineered features and a Bayesian Beta-binomial mixture model to rank policies for investigation. Parameter learning is performed with MCMC, enabling informative priors and full posterior uncertainty in policy classifications. A year-long deployment demonstrates strong practical value, achieving high accuracy on reviewed policies and significantly reducing manual review workload, enabling scalable risk identification.

Abstract

Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases. An appropriately formed priority score, generated by automated analysis of GPS data, allows underwriters to make more efficient use of their time, improving detection of the behaviour under investigation. An example of such behaviour is the use of a privately insured vehicle for commercial purposes, such as delivering meals and parcels. We first make use of trip GPS and accelerometer data, augmented by geospatial information, to train an imperfect classifier for delivery driving on a per-trip basis. We make use of a mixture of Beta-Binomial distributions to model the propensity of a policyholder to undertake trips which result in a positive classification as being drawn from either a rare high-scoring or common low-scoring group, and learn the parameters of this model using MCMC. This model provides us with a posterior probability that any policyholder will be a regular generator of automated alerts given any number of trips and alerts. This posterior probability is converted to a priority score, which was used to select the most valuable candidates for manual investigation. Testing over a 1-year period ranked policyholders by likelihood of commercial driving activity on a weekly basis. The top 0.9% have been reviewed at least once by the underwriters at the time of writing, and of those 99.4% have been confirmed as correctly identified, showing the approach has achieved a significant improvement in efficiency of human resource allocation compared to manual searching.

A Bayesian Approach for Prioritising Driving Behaviour Investigations in Telematic Auto Insurance Policies

TL;DR

The paper tackles identifying delivery driving within telematics data to prevent underinsurance and misrepresentation. It introduces a two-stage approach: a per-trip GPS-path classifier using engineered features and a Bayesian Beta-binomial mixture model to rank policies for investigation. Parameter learning is performed with MCMC, enabling informative priors and full posterior uncertainty in policy classifications. A year-long deployment demonstrates strong practical value, achieving high accuracy on reviewed policies and significantly reducing manual review workload, enabling scalable risk identification.

Abstract

Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases. An appropriately formed priority score, generated by automated analysis of GPS data, allows underwriters to make more efficient use of their time, improving detection of the behaviour under investigation. An example of such behaviour is the use of a privately insured vehicle for commercial purposes, such as delivering meals and parcels. We first make use of trip GPS and accelerometer data, augmented by geospatial information, to train an imperfect classifier for delivery driving on a per-trip basis. We make use of a mixture of Beta-Binomial distributions to model the propensity of a policyholder to undertake trips which result in a positive classification as being drawn from either a rare high-scoring or common low-scoring group, and learn the parameters of this model using MCMC. This model provides us with a posterior probability that any policyholder will be a regular generator of automated alerts given any number of trips and alerts. This posterior probability is converted to a priority score, which was used to select the most valuable candidates for manual investigation. Testing over a 1-year period ranked policyholders by likelihood of commercial driving activity on a weekly basis. The top 0.9% have been reviewed at least once by the underwriters at the time of writing, and of those 99.4% have been confirmed as correctly identified, showing the approach has achieved a significant improvement in efficiency of human resource allocation compared to manual searching.
Paper Structure (12 sections, 9 equations, 8 figures)

This paper contains 12 sections, 9 equations, 8 figures.

Figures (8)

  • Figure 1: An example of a delivery trip candidate. Red markers represent stops at busy streets with nearby restaurants; blue markers represent stops at residential roads; and the green marker represents an (anonymised) policyholder home address. The radius of the marker is proportional to the amount of time they stayed there. The driver alternates between the blue markers, which they visit only for a few minutes before returning to the red markers area. This is a recognised pattern in delivery driving behaviour.
  • Figure 2: Output of our unsupervised pipeline on the deliveries dataset. This 2-dimensional representation of the dataset exhibits four distinct groups discovered by our approach. By matching up a handful of manually-identified true delivery trips with the clusters, we found that Cluster 3 (red) contained all of these examples. By submitting a random selection of 62 samples from Cluster 3 for manual review by the insurer's underwriting team, 58 of them were found to be true delivery trips.
  • Figure 3: Directed Graphical Model for the latent variable mixture model construction. Only the data ${x, y}$ and the model hyperparameters $\Psi$ are observed, all other quantities must be inferred.
  • Figure 4: Samples of the prior (top row) and posterior (bottom row) of our belief over the distributions of $p(q \mid k = 0)$ (left column), $p(\theta)$ (centre column), and $p(q \mid k = 1)$ (right column). Under prior belief there is a relatively wide range of likely distributions, however after conditioning on observed data these distributions are more tightly defined.
  • Figure 5: Screenshot of web application (on dummy data) developed to review drivers suspected to carry out commercial activities outside their policy T&Cs. The Priority column provides the output of the model described in Section \ref{['sec:policy-classification']}
  • ...and 3 more figures