Factuality of Large Language Models: A Survey

Yuxia Wang; Minghan Wang; Muhammad Arslan Manzoor; Fei Liu; Georgi Georgiev; Rocktim Jyoti Das; Preslav Nakov

Factuality of Large Language Models: A Survey

Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Georgiev, Rocktim Jyoti Das, Preslav Nakov

TL;DR

This survey critically analyze existing work on evaluating and improving the factuality of large language models with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation.

Abstract

Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicability in real-world scenarios. As a result, research on evaluating and improving the factuality of LLMs has attracted a lot of attention recently. In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. We further offer an outlook on where future research should go.

Factuality of Large Language Models: A Survey

TL;DR

Abstract

Paper Structure (26 sections, 1 figure, 2 tables)

This paper contains 26 sections, 1 figure, 2 tables.

Introduction
Background
Hallucination vs. Factuality
Trustworthiness/Reliability vs. Factuality
Evaluating Factuality
Datasets and Metrics
Other Metrics
Improving Factuality
Pre-training
Tuning and RLXF
Retrieval Augmentation
Inference
Decoding Strategy
ICL and Self-reasoning
Retrieval Augmentation
...and 11 more sections

Figures (1)

Figure 1: Fact-checker framework: claim processor, retriever, and verifier, with optional step of summarizing and explaining in gray.

Factuality of Large Language Models: A Survey

TL;DR

Abstract

Factuality of Large Language Models: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (1)