Table of Contents
Fetching ...

The Pitfalls of Defining Hallucination

Kees van Deemter

TL;DR

The paper argues that current definitions and classifications of hallucination in NLG are inconsistent and insufficient for robust evaluation. It proposes a logic-based synthesis to unify input–output veracity analyses, framing categories in terms of entailment, independence, and boundary cases, while acknowledging pragmatic and world-knowledge considerations. It extends these ideas to Large Language Models and open-ended tasks, introducing conceptions like fact-conflicting information and withholdings, and suggests using BDI logic to reason about audience interpretation. The work highlights limitations of NLI-based approaches, advocates cross-disciplinary collaboration with logicians, and emphasizes the need to address ambiguity and figurative language to improve veracity assessment in real-world applications.

Abstract

Despite impressive advances in Natural Language Generation (NLG) and Large Language Models (LLMs), researchers are still unclear about important aspects of NLG evaluation. To substantiate this claim, I examine current classifications of hallucination and omission in Data-text NLG, and I propose a logic-based synthesis of these classfications. I conclude by highlighting some remaining limitations of all current thinking about hallucination and by discussing implications for LLMs.

The Pitfalls of Defining Hallucination

TL;DR

The paper argues that current definitions and classifications of hallucination in NLG are inconsistent and insufficient for robust evaluation. It proposes a logic-based synthesis to unify input–output veracity analyses, framing categories in terms of entailment, independence, and boundary cases, while acknowledging pragmatic and world-knowledge considerations. It extends these ideas to Large Language Models and open-ended tasks, introducing conceptions like fact-conflicting information and withholdings, and suggests using BDI logic to reason about audience interpretation. The work highlights limitations of NLI-based approaches, advocates cross-disciplinary collaboration with logicians, and emphasizes the need to address ambiguity and figurative language to improve veracity assessment in real-world applications.

Abstract

Despite impressive advances in Natural Language Generation (NLG) and Large Language Models (LLMs), researchers are still unclear about important aspects of NLG evaluation. To substantiate this claim, I examine current classifications of hallucination and omission in Data-text NLG, and I propose a logic-based synthesis of these classfications. I conclude by highlighting some remaining limitations of all current thinking about hallucination and by discussing implications for LLMs.
Paper Structure (6 sections)