Table of Contents
Fetching ...

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao

TL;DR

SaySelf addresses the reliability gap in LLM outputs by teaching models to emit fine-grained confidence estimates and self-reflective rationales. It introduces a model-specific supervised fine-tuning dataset built from multiple sampled reasoning chains, followed by reinforcement learning with a task-supervision reward to calibrate confidence at the instance level. The approach achieves improved confidence calibration (lower ECE, higher AUROC) while preserving task performance across diverse knowledge-intensive tasks, and provides interpretable rationales that reveal uncertainty sources. These contributions advance trustworthy LLM deployment in real-world knowledge tasks and support safer human-AI interaction.

Abstract

Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present the advanced SaySelf, a training framework that teaches LLMs to express more accurate fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results in both in-distribution and out-of-distribution datasets demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration. The code is made public at https://github.com/xu1868/SaySelf.

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

TL;DR

SaySelf addresses the reliability gap in LLM outputs by teaching models to emit fine-grained confidence estimates and self-reflective rationales. It introduces a model-specific supervised fine-tuning dataset built from multiple sampled reasoning chains, followed by reinforcement learning with a task-supervision reward to calibrate confidence at the instance level. The approach achieves improved confidence calibration (lower ECE, higher AUROC) while preserving task performance across diverse knowledge-intensive tasks, and provides interpretable rationales that reveal uncertainty sources. These contributions advance trustworthy LLM deployment in real-world knowledge tasks and support safer human-AI interaction.

Abstract

Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based approaches are limited to binary or inaccurate group-level confidence estimates. In this work, we present the advanced SaySelf, a training framework that teaches LLMs to express more accurate fine-grained confidence estimates. In addition, beyond the confidence scores, SaySelf initiates the process of directing LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge and explain their uncertainty. This is achieved by using an LLM to automatically summarize the uncertainties in specific knowledge via natural language. The summarization is based on the analysis of the inconsistency in multiple sampled reasoning chains, and the resulting data is utilized for supervised fine-tuning. Moreover, we utilize reinforcement learning with a meticulously crafted reward function to calibrate the confidence estimates, motivating LLMs to deliver accurate, high-confidence predictions and to penalize overconfidence in erroneous outputs. Experimental results in both in-distribution and out-of-distribution datasets demonstrate the effectiveness of SaySelf in reducing the confidence calibration error and maintaining the task performance. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration. The code is made public at https://github.com/xu1868/SaySelf.
Paper Structure (26 sections, 5 equations, 4 figures, 6 tables)

This paper contains 26 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The comparison between SaySelf and previous work. SaySelf can produce the self-reflective rationale that explains why the model is uncertain and the fine-grained and accurate confidence estimates. This simple example is constructed for illustration purposes, and the reasoning chain is omitted for brevity.
  • Figure 2: The overview of SaySelf, consisting of the supervised fine-tuning and reinforcement learning from task supervision stages. The former stage trains LLMs to generate self-reflective rationales and confidence estimates based on multiple sampling, and the latter stage employs reinforcement learning to further calibrate the confidence estimates based on task supervision. $q$, $s$, $c$, and $r$ denote question, response, confidence estimate, and self-reflective rationale respectively.
  • Figure 3: Case study of SaySelf's capability to generate insightful self-reflective rationales that effectively capture the internal uncertainty in LLMs. Various clusters illustrate a selection from 100 sampled responses, and the rationale is generated by LLMs. Another example is given in Figure \ref{['fig:casestudy2']} in the Appendix.
  • Figure 4: Case study of SaySelf's capability to generate insightful self-reflective rationales.