Table of Contents
Fetching ...

An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI

Francis Boabang, Samuel Asante Gyamerah

TL;DR

This work tackles the challenge of extreme class imbalance in auto-insurance fraud detection by proposing a three-stage training framework that starts with a convex surrogate focal loss for stable initialization, progresses through a controlled non-convex intermediate loss to enhance feature discrimination, and ends with the standard focal loss to sharpen minority-class sensitivity. The approach is underpinned by a convexity analysis in the probability space, enabling a reliable warm-start before non-convex refinement, and is implemented on an LSTM-based time-series model with SMOTE-based resampling. Evaluations on a proprietary dataset show improvements in minority-class F1 and AUC compared to traditional focal-loss training and resampling baselines, with SHAP-based explanations providing transparent feature-attribution useful for actuarial and fraud analytics. The combination of stabilized optimization and interpretable explanations offers practical benefits for real-world fraud detection systems operating under substantial skew, and the paper suggests avenues for adaptive transition strategies and cross-domain applications.

Abstract

Detecting fraudulent auto-insurance claims remains a challenging classification problem, largely due to the extreme imbalance between legitimate and fraudulent cases. Standard learning algorithms tend to overfit to the majority class, resulting in poor detection of economically significant minority events. This paper proposes a structured three-stage training framework that integrates a convex surrogate of focal loss for stable initialization, a controlled non-convex intermediate loss to improve feature discrimination, and the standard focal loss to refine minority-class sensitivity. We derive conditions under which the surrogate retains convexity in the prediction space and show how this facilitates more reliable optimization when combined with deep sequential models. Using a proprietary auto-insurance dataset, the proposed method improves minority-class F1-scores and AUC relative to conventional focal-loss training and resampling baselines. The approach also provides interpretable feature-attribution patterns through SHAP analysis, offering transparency for actuarial and fraud-analytics applications.

An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI

TL;DR

This work tackles the challenge of extreme class imbalance in auto-insurance fraud detection by proposing a three-stage training framework that starts with a convex surrogate focal loss for stable initialization, progresses through a controlled non-convex intermediate loss to enhance feature discrimination, and ends with the standard focal loss to sharpen minority-class sensitivity. The approach is underpinned by a convexity analysis in the probability space, enabling a reliable warm-start before non-convex refinement, and is implemented on an LSTM-based time-series model with SMOTE-based resampling. Evaluations on a proprietary dataset show improvements in minority-class F1 and AUC compared to traditional focal-loss training and resampling baselines, with SHAP-based explanations providing transparent feature-attribution useful for actuarial and fraud analytics. The combination of stabilized optimization and interpretable explanations offers practical benefits for real-world fraud detection systems operating under substantial skew, and the paper suggests avenues for adaptive transition strategies and cross-domain applications.

Abstract

Detecting fraudulent auto-insurance claims remains a challenging classification problem, largely due to the extreme imbalance between legitimate and fraudulent cases. Standard learning algorithms tend to overfit to the majority class, resulting in poor detection of economically significant minority events. This paper proposes a structured three-stage training framework that integrates a convex surrogate of focal loss for stable initialization, a controlled non-convex intermediate loss to improve feature discrimination, and the standard focal loss to refine minority-class sensitivity. We derive conditions under which the surrogate retains convexity in the prediction space and show how this facilitates more reliable optimization when combined with deep sequential models. Using a proprietary auto-insurance dataset, the proposed method improves minority-class F1-scores and AUC relative to conventional focal-loss training and resampling baselines. The approach also provides interpretable feature-attribution patterns through SHAP analysis, offering transparency for actuarial and fraud-analytics applications.

Paper Structure

This paper contains 8 sections, 24 equations, 4 figures, 2 tables, 2 algorithms.

Figures (4)

  • Figure 1: Chi-square test $p$-values for pairwise associations between categorical features in the insurance fraud dataset. Darker blue indicates stronger dependence (small $p$-values), while lighter red denotes weaker association.
  • Figure 2: Overview of the auto-insurance fraud detection system under class imbalance, with three-stage focal-loss training and SHAP-based model explanation.
  • Figure 3: ROC curves for the different training schedules under class imbalance.
  • Figure 4: Feature index vs. SHAP values under class imbalance.

Theorems & Definitions (2)

  • Remark 1
  • Remark 2