Table of Contents
Fetching ...

Quebec Automobile Insurance Question-Answering With Retrieval-Augmented Generation

David Beauchemin, Zachary Gagnon, Ricahrd Khoury

TL;DR

This study leverages both corpora to automatically and manually assess a GPT4-o, a state-of-the-art (SOTA) LLM, to answer Quebec automobile insurance questions and demonstrates that LLM QA is unreliable enough for mass utilization in critical areas.

Abstract

Large Language Models (LLMs) perform outstandingly in various downstream tasks, and the use of the Retrieval-Augmented Generation (RAG) architecture has been shown to improve performance for legal question answering (Nuruzzaman and Hussain, 2020; Louis et al., 2024). However, there are limited applications in insurance questions-answering, a specific type of legal document. This paper introduces two corpora: the Quebec Automobile Insurance Expertise Reference Corpus and a set of 82 Expert Answers to Layperson Automobile Insurance Questions. Our study leverages both corpora to automatically and manually assess a GPT4-o, a state-of-the-art LLM, to answer Quebec automobile insurance questions. Our results demonstrate that, on average, using our expertise reference corpus generates better responses on both automatic and manual evaluation metrics. However, they also highlight that LLM QA is unreliable enough for mass utilization in critical areas. Indeed, our results show that between 5% to 13% of answered questions include a false statement that could lead to customer misunderstanding.

Quebec Automobile Insurance Question-Answering With Retrieval-Augmented Generation

TL;DR

This study leverages both corpora to automatically and manually assess a GPT4-o, a state-of-the-art (SOTA) LLM, to answer Quebec automobile insurance questions and demonstrates that LLM QA is unreliable enough for mass utilization in critical areas.

Abstract

Large Language Models (LLMs) perform outstandingly in various downstream tasks, and the use of the Retrieval-Augmented Generation (RAG) architecture has been shown to improve performance for legal question answering (Nuruzzaman and Hussain, 2020; Louis et al., 2024). However, there are limited applications in insurance questions-answering, a specific type of legal document. This paper introduces two corpora: the Quebec Automobile Insurance Expertise Reference Corpus and a set of 82 Expert Answers to Layperson Automobile Insurance Questions. Our study leverages both corpora to automatically and manually assess a GPT4-o, a state-of-the-art LLM, to answer Quebec automobile insurance questions. Our results demonstrate that, on average, using our expertise reference corpus generates better responses on both automatic and manual evaluation metrics. However, they also highlight that LLM QA is unreliable enough for mass utilization in critical areas. Indeed, our results show that between 5% to 13% of answered questions include a false statement that could lead to customer misunderstanding.

Paper Structure

This paper contains 30 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: A representative instance of the 3-steps RAG process applied to question answering. 1) Indexing: Documents are split into chunks, and encoded into vectors in a vector database. 2) Retrieval: Retrieve the Top k chunks most relevant to the question based on semantic similarity. 3) Generation: Input the original question and the retrieved chunks together into LLM to generate the final answer. The illustration is taken from gao2023retrieval.
  • Figure 2: Zero-shot prompt used for text generation. Blue boxes contain the task instructions. Yellow boxes contain the prefix for the model to continue.
  • Figure 3: Prompt used for text generation. Blue boxes contain the task instructions. Yellow boxes contain the prefix for the model to continue.
  • Figure 4: The Prodigy annotation interface (in French) used for evaluation.