Table of Contents
Fetching ...

FaaF: Facts as a Function for the evaluation of generated text

Vasileios Katranidis, Gabor Barany

TL;DR

FaaF is introduced, a new approach to the fact verification task that leverages the function-calling capabilities of LMs and significantly enhances the ability of LMs to identify unsupported facts in texts, while also improving efficiency and significantly lowering costs compared to prompt-based methods.

Abstract

The demand for accurate and efficient verification of information in texts generated by large language models (LMs) is at an all-time high, but remains unresolved. Recent efforts have focused on extracting and verifying atomic facts from these texts via prompting LM evaluators. However, we demonstrate that this method of prompting is unreliable when faced with incomplete or inaccurate reference information. We introduce Facts as a Function (FaaF), a new approach to the fact verification task that leverages the function-calling capabilities of LMs. FaaF significantly enhances the ability of LMs to identify unsupported facts in texts, while also improving efficiency and significantly lowering costs compared to prompt-based methods. Additionally, we propose a framework for evaluating factual recall in Retrieval Augmented Generation (RAG) systems, which we employ to compare prompt-based and FaaF methods using various LMs under challenging conditions.

FaaF: Facts as a Function for the evaluation of generated text

TL;DR

FaaF is introduced, a new approach to the fact verification task that leverages the function-calling capabilities of LMs and significantly enhances the ability of LMs to identify unsupported facts in texts, while also improving efficiency and significantly lowering costs compared to prompt-based methods.

Abstract

The demand for accurate and efficient verification of information in texts generated by large language models (LMs) is at an all-time high, but remains unresolved. Recent efforts have focused on extracting and verifying atomic facts from these texts via prompting LM evaluators. However, we demonstrate that this method of prompting is unreliable when faced with incomplete or inaccurate reference information. We introduce Facts as a Function (FaaF), a new approach to the fact verification task that leverages the function-calling capabilities of LMs. FaaF significantly enhances the ability of LMs to identify unsupported facts in texts, while also improving efficiency and significantly lowering costs compared to prompt-based methods. Additionally, we propose a framework for evaluating factual recall in Retrieval Augmented Generation (RAG) systems, which we employ to compare prompt-based and FaaF methods using various LMs under challenging conditions.
Paper Structure (6 sections, 6 equations, 3 figures, 3 tables)

This paper contains 6 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: An overview of FaaF, a constructor dynamically creates a function object from a set of fact statements. Function calling allows LMeval to verify all facts within a single call when provided with an input reference text. FaaF significantly reduces the error rate in identifying unsupported facts compared to prompting whilst reducing the number of LMeval calls and output tokens by more than 5 times.
  • Figure 2: Overview of the factual recall evaluation for RAG. Given a set of ground truth Answers, facts are extracted via LMf. The hypothesized responses of the RAG (in this instance Ungrounded Answer and Poor Answer) are then tested for recall against the extracted facts.
  • Figure 3: LMeval call count for a full evaluation of WikiEvalFacts. FaaF formulations result in more than five times less LM calls considering an average of 5.6 fact statements per QA pair.