Large Language Models have Intrinsic Self-Correction Ability

Dancheng Liu; Amir Nassereldine; Ziming Yang; Chenhui Xu; Yuting Hu; Jiajie Li; Utkarsh Kumar; Changjae Lee; Ruiyang Qin; Yiyu Shi; Jinjun Xiong

Large Language Models have Intrinsic Self-Correction Ability

Dancheng Liu, Amir Nassereldine, Ziming Yang, Chenhui Xu, Yuting Hu, Jiajie Li, Utkarsh Kumar, Changjae Lee, Ruiyang Qin, Yiyu Shi, Jinjun Xiong

TL;DR

This paper argues that intrinsic self-correction in large language models is a real capability, not an artifact of external feedback. It develops a theoretical and empirical framework linking intrinsic SC to chain-of-thought-like reasoning and self-verification, and identifies zero temperature and unbiased prompts as key enablers. Through experiments across multiple models and benchmarks, the authors show that intrinsic SC can improve accuracy when prompts are fair and temperature is kept at zero, with larger models exhibiting stronger SC effects. The work provides practical guidelines for prompting intrinsic SC and contributes to the theoretical understanding of LLM self-correction mechanisms.

Abstract

Large language models (LLMs) have attracted significant attention for their exceptional abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.

Large Language Models have Intrinsic Self-Correction Ability

TL;DR

Abstract

Paper Structure (43 sections, 1 theorem, 8 equations, 3 figures, 15 tables)

This paper contains 43 sections, 1 theorem, 8 equations, 3 figures, 15 tables.

Introduction
New Perspective on Intrinsic Self Correction
Preliminary
Does intrinsic SC exist in LLMs?
Why can't LLM answer questions correctly in the initial attempt?
Experiment Setup
Increasing Temperature Might Decrease Accuracy During SC
Theoretical Analysis
Impact of Temperature on Self-Correction
Designing Fair Prompt for Self-Correction
Theoretical Analysis
Intrinsic SC could be achieved with fair prompt under zero temperature
Ablation study on Model size
Limitations
Conclusion
...and 28 more sections

Key Result

Proposition 2.1

LLMs are generally under-performing compared to their true ability because hallucination will cause the overall accuracy to decrease.

Figures (3)

Figure 1: An example where the biased prompt (left) shifts the answer from correct to incorrect between the intrinsic SC stages, whereas our unbiased prompt (right) maintains the correct answer. The blue, yellow, and red regions correspond to Stage 1,2,3 in Section \ref{['section2.2']}, respectively. Phrases such as "find problems" and "improve" might hint at an incorrect initial answer to the LLM and force it to change answers. On the other hand, the unbiased prompt avoids those unnecessary changes.
Figure 2: Comparison of the effect of temperature on SC ability measured as the difference in accuracy before and after SC ($\Delta$ SC.) across the two GPT models on the Commonsense QA dataset.
Figure 3: Trend on the change after intrinsic SC across models of different sizes.

Theorems & Definitions (2)

Proposition 2.1
proof : Proof of Proposition \ref{['lemma1']}

Large Language Models have Intrinsic Self-Correction Ability

TL;DR

Abstract

Large Language Models have Intrinsic Self-Correction Ability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)