Table of Contents
Fetching ...

Large Language Models have Intrinsic Self-Correction Ability

Dancheng Liu, Amir Nassereldine, Ziming Yang, Chenhui Xu, Yuting Hu, Jiajie Li, Utkarsh Kumar, Changjae Lee, Ruiyang Qin, Yiyu Shi, Jinjun Xiong

TL;DR

This paper argues that intrinsic self-correction in large language models is a real capability, not an artifact of external feedback. It develops a theoretical and empirical framework linking intrinsic SC to chain-of-thought-like reasoning and self-verification, and identifies zero temperature and unbiased prompts as key enablers. Through experiments across multiple models and benchmarks, the authors show that intrinsic SC can improve accuracy when prompts are fair and temperature is kept at zero, with larger models exhibiting stronger SC effects. The work provides practical guidelines for prompting intrinsic SC and contributes to the theoretical understanding of LLM self-correction mechanisms.

Abstract

Large language models (LLMs) have attracted significant attention for their exceptional abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.

Large Language Models have Intrinsic Self-Correction Ability

TL;DR

This paper argues that intrinsic self-correction in large language models is a real capability, not an artifact of external feedback. It develops a theoretical and empirical framework linking intrinsic SC to chain-of-thought-like reasoning and self-verification, and identifies zero temperature and unbiased prompts as key enablers. Through experiments across multiple models and benchmarks, the authors show that intrinsic SC can improve accuracy when prompts are fair and temperature is kept at zero, with larger models exhibiting stronger SC effects. The work provides practical guidelines for prompting intrinsic SC and contributes to the theoretical understanding of LLM self-correction mechanisms.

Abstract

Large language models (LLMs) have attracted significant attention for their exceptional abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.
Paper Structure (43 sections, 1 theorem, 8 equations, 3 figures, 15 tables)

This paper contains 43 sections, 1 theorem, 8 equations, 3 figures, 15 tables.

Key Result

Proposition 2.1

LLMs are generally under-performing compared to their true ability because hallucination will cause the overall accuracy to decrease.

Figures (3)

  • Figure 1: An example where the biased prompt (left) shifts the answer from correct to incorrect between the intrinsic SC stages, whereas our unbiased prompt (right) maintains the correct answer. The blue, yellow, and red regions correspond to Stage 1,2,3 in Section \ref{['section2.2']}, respectively. Phrases such as "find problems" and "improve" might hint at an incorrect initial answer to the LLM and force it to change answers. On the other hand, the unbiased prompt avoids those unnecessary changes.
  • Figure 2: Comparison of the effect of temperature on SC ability measured as the difference in accuracy before and after SC ($\Delta$ SC.) across the two GPT models on the Commonsense QA dataset.
  • Figure 3: Trend on the change after intrinsic SC across models of different sizes.

Theorems & Definitions (2)

  • Proposition 2.1
  • proof : Proof of Proposition \ref{['lemma1']}