Table of Contents
Fetching ...

Internal Consistency and Self-Feedback in Large Language Models: A Survey

Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Yi Wang, Zhonghao Wang, Feiyu Xiong, Zhiyu Li

TL;DR

This survey reframes the persistent issues of reasoning gaps and hallucinations in large language models through the lens of internal consistency. It introduces Self-Feedback, a two-module framework of Self-Evaluation and Self-Update, to mine and leverage consistency signals across response, decoding, and latent layers. The work catalogs a taxonomy of methods, signal types, and task lines (reasoning elevation, hallucination alleviation, and others), and discusses evaluation paradigms and contested conclusions about effectiveness. It also outlines future directions, including textual self-awareness and unified, multi-layer evaluation frameworks, to drive more reliable and truthful LLM behavior in real-world settings.

Abstract

Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations. To address these, studies prefixed with "Self-" such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating themselves. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization. In this paper, we use a unified perspective of internal consistency, offering explanations for reasoning deficiencies and hallucinations. Internal consistency refers to the consistency in expressions among LLMs' latent, decoding, or response layers based on sampling methodologies. Then, we introduce an effective theoretical framework capable of mining internal consistency, named Self-Feedback. This framework consists of two modules: Self-Evaluation and Self-Update. The former captures internal consistency signals, while the latter leverages the signals to enhance either the model's response or the model itself. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, "Does Self-Feedback Really Work?" We also propose several critical viewpoints, including the "Hourglass Evolution of Internal Consistency", "Consistency Is (Almost) Correctness" hypothesis, and "The Paradox of Latent and Explicit Reasoning". The relevant resources are open-sourced at https://github.com/IAAR-Shanghai/ICSFSurvey.

Internal Consistency and Self-Feedback in Large Language Models: A Survey

TL;DR

This survey reframes the persistent issues of reasoning gaps and hallucinations in large language models through the lens of internal consistency. It introduces Self-Feedback, a two-module framework of Self-Evaluation and Self-Update, to mine and leverage consistency signals across response, decoding, and latent layers. The work catalogs a taxonomy of methods, signal types, and task lines (reasoning elevation, hallucination alleviation, and others), and discusses evaluation paradigms and contested conclusions about effectiveness. It also outlines future directions, including textual self-awareness and unified, multi-layer evaluation frameworks, to drive more reliable and truthful LLM behavior in real-world settings.

Abstract

Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations. To address these, studies prefixed with "Self-" such as Self-Consistency, Self-Improve, and Self-Refine have been initiated. They share a commonality: involving LLMs evaluating and updating themselves. Nonetheless, these efforts lack a unified perspective on summarization, as existing surveys predominantly focus on categorization. In this paper, we use a unified perspective of internal consistency, offering explanations for reasoning deficiencies and hallucinations. Internal consistency refers to the consistency in expressions among LLMs' latent, decoding, or response layers based on sampling methodologies. Then, we introduce an effective theoretical framework capable of mining internal consistency, named Self-Feedback. This framework consists of two modules: Self-Evaluation and Self-Update. The former captures internal consistency signals, while the latter leverages the signals to enhance either the model's response or the model itself. This framework has been employed in numerous studies. We systematically classify these studies by tasks and lines of work; summarize relevant evaluation methods and benchmarks; and delve into the concern, "Does Self-Feedback Really Work?" We also propose several critical viewpoints, including the "Hourglass Evolution of Internal Consistency", "Consistency Is (Almost) Correctness" hypothesis, and "The Paradox of Latent and Explicit Reasoning". The relevant resources are open-sourced at https://github.com/IAAR-Shanghai/ICSFSurvey.
Paper Structure (49 sections, 13 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 49 sections, 13 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: GPT-4o provides different answers to the same question. The complete responses can be found in our https://github.com/IAAR-Shanghai/ICSFSurvey.
  • Figure 2: Relative search interest for the keywords "LLM Hallucination" and "LLM Reasoning" from Google Trends on June 14, 2024.
  • Figure 3: Core Concepts and Article Organization (Mainly Involving Sections \ref{['sec:internal_consistency']} ̃\ref{['sec:other_tasks']}).
  • Figure 4: Positions of the Three Types of Consistency
  • Figure 5: The Hourglass Evolution of Internal Consistency
  • ...and 5 more figures