Table of Contents
Fetching ...

Claim Automation using Large Language Model

Zhengda Mo, Zhiyu Quan, Eli O'Donohue, Kaiwen Zhong

TL;DR

This study proposes a locally deployed governance-aware language modeling component that generates structured corrective-action recommendations from unstructured claim narratives, and fine-tune pretrained LLMs using Low-Rank Adaptation (LoRA), demonstrating its promise as a reliable and governable building block for insurance applications.

Abstract

While Large Language Models (LLMs) have achieved strong performance on general-purpose language tasks, their deployment in regulated and data-sensitive domains, including insurance, remains limited. Leveraging millions of historical warranty claims, we propose a locally deployed governance-aware language modeling component that generates structured corrective-action recommendations from unstructured claim narratives. We fine-tune pretrained LLMs using Low-Rank Adaptation (LoRA), scoping the model to an initial decision module within the claim processing pipeline to speed up claim adjusters' decisions. We assess this module using a multi-dimensional evaluation framework that combines automated semantic similarity metrics with human evaluation, enabling a rigorous examination of both practical utility and predictive accuracy. Our results show that domain-specific fine-tuning substantially outperforms commercial general-purpose and prompt-based LLMs, with approximately 80% of the evaluated cases achieving near-identical matches to ground-truth corrective actions. Overall, this study provides both theoretical and empirical evidence to prove that domain-adaptive fine-tuning can align model output distributions more closely with real-world operational data, demonstrating its promise as a reliable and governable building block for insurance applications.

Claim Automation using Large Language Model

TL;DR

This study proposes a locally deployed governance-aware language modeling component that generates structured corrective-action recommendations from unstructured claim narratives, and fine-tune pretrained LLMs using Low-Rank Adaptation (LoRA), demonstrating its promise as a reliable and governable building block for insurance applications.

Abstract

While Large Language Models (LLMs) have achieved strong performance on general-purpose language tasks, their deployment in regulated and data-sensitive domains, including insurance, remains limited. Leveraging millions of historical warranty claims, we propose a locally deployed governance-aware language modeling component that generates structured corrective-action recommendations from unstructured claim narratives. We fine-tune pretrained LLMs using Low-Rank Adaptation (LoRA), scoping the model to an initial decision module within the claim processing pipeline to speed up claim adjusters' decisions. We assess this module using a multi-dimensional evaluation framework that combines automated semantic similarity metrics with human evaluation, enabling a rigorous examination of both practical utility and predictive accuracy. Our results show that domain-specific fine-tuning substantially outperforms commercial general-purpose and prompt-based LLMs, with approximately 80% of the evaluated cases achieving near-identical matches to ground-truth corrective actions. Overall, this study provides both theoretical and empirical evidence to prove that domain-adaptive fine-tuning can align model output distributions more closely with real-world operational data, demonstrating its promise as a reliable and governable building block for insurance applications.
Paper Structure (64 sections, 63 equations, 9 figures, 7 tables)

This paper contains 64 sections, 63 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of the token-level generation architecture used for claim automation.
  • Figure 2: LoRA adaptation applies to a single projection matrix. The original weight matrix $W_\mathrm{frozen} \in \mathbb{R}^{d_\mathrm{out} \times d_\mathrm{in}}$ remains unchanged, while trainable matrices $A \in \mathbb{R}^{r \times d_\mathrm{in}}$ and $B \in \mathbb{R}^{d_\mathrm{out} \times r}$ introduce a low-rank update via $\Delta W = B A$. The effective weight is $W_\mathrm{frozen} + \Delta W$.
  • Figure 3: Self-attention module and FFN in transformer. Each matrix represented by a light blue parallelogram corresponds to a learnable projection matrix within the transformer block. These matrices are potential targets for LoRA adaptation, where low-rank updates can be applied individually to each matrix.
  • Figure 4: Semantic similarity distributions measured by BERT cosine similarity.
  • Figure 5: BERT cosine similarity: DeepSeek-R1 + Fine-tune versus Gemini-2.5-Flash on the full evaluation set and HQ subset.
  • ...and 4 more figures