Table of Contents
Fetching ...

An Audit on the Perspectives and Challenges of Hallucinations in NLP

Pranav Narayanan Venkit, Tatiana Chakravorti, Vipul Gupta, Heidi Biggs, Mukund Srinath, Koustava Goswami, Sarah Rajtmajer, Shomir Wilson

TL;DR

This audit exposes a fragmented landscape around NLP hallucination, showing widespread variability in definitions, frameworks, and metrics across 103 papers, complemented by a practitioner survey of 171 researchers. By mapping seven NLP-subfields and 31 conceptual frameworks, the study reveals a lack of consensus and minimal engagement with sociotechnical perspectives. It contributes an explicit call for standardized terminology, transparent methodological documentation, and sociotechnical framing, accompanied by dual recommendations for authors and the research community. The work advances practical guidance for reducing misinterpretation and societal risk from generative models, with implications for funding, evaluation, and governance of AI systems.

Abstract

We audit how hallucination in large language models (LLMs) is characterized in peer-reviewed literature, using a critical examination of 103 publications across NLP research. Through the examination of the literature, we identify a lack of agreement with the term `hallucination' in the field of NLP. Additionally, to compliment our audit, we conduct a survey with 171 practitioners from the field of NLP and AI to capture varying perspectives on hallucination. Our analysis calls for the necessity of explicit definitions and frameworks outlining hallucination within NLP, highlighting potential challenges, and our survey inputs provide a thematic understanding of the influence and ramifications of hallucination in society.

An Audit on the Perspectives and Challenges of Hallucinations in NLP

TL;DR

This audit exposes a fragmented landscape around NLP hallucination, showing widespread variability in definitions, frameworks, and metrics across 103 papers, complemented by a practitioner survey of 171 researchers. By mapping seven NLP-subfields and 31 conceptual frameworks, the study reveals a lack of consensus and minimal engagement with sociotechnical perspectives. It contributes an explicit call for standardized terminology, transparent methodological documentation, and sociotechnical framing, accompanied by dual recommendations for authors and the research community. The work advances practical guidance for reducing misinterpretation and societal risk from generative models, with implications for funding, evaluation, and governance of AI systems.

Abstract

We audit how hallucination in large language models (LLMs) is characterized in peer-reviewed literature, using a critical examination of 103 publications across NLP research. Through the examination of the literature, we identify a lack of agreement with the term `hallucination' in the field of NLP. Additionally, to compliment our audit, we conduct a survey with 171 practitioners from the field of NLP and AI to capture varying perspectives on hallucination. Our analysis calls for the necessity of explicit definitions and frameworks outlining hallucination within NLP, highlighting potential challenges, and our survey inputs provide a thematic understanding of the influence and ramifications of hallucination in society.
Paper Structure (34 sections, 5 figures, 3 tables)

This paper contains 34 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Articles published each year (from 2013 to 2023) in SCOPUS that contain the term 'hallucination' AND ('NLP' OR 'AI') in the title, abstract, or keywords.
  • Figure 2: Hallucination evaluation metrics used in NLP.
  • Figure 3: Respondents familiarity with 'Hallucination'
  • Figure 4: Frequency of encountering 'Hallucination'
  • Figure 5: Frequency of Text Generation Model Usage