CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

Qing Zong; Jiayu Liu; Tianshi Zheng; Chunyang Li; Baixuan Xu; Haochen Shi; Weiqi Wang; Zhaowei Wang; Chunkit Chan; Yangqiu Song

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

Qing Zong, Jiayu Liu, Tianshi Zheng, Chunyang Li, Baixuan Xu, Haochen Shi, Weiqi Wang, Zhaowei Wang, Chunkit Chan, Yangqiu Song

TL;DR

The paper tackles the challenge of verbalized confidence calibration in LLMs for high-stakes scenarios. It introduces critique-based learning, specifically Self-Critique and CritiCal, with CritiCal using teacher-generated natural language critiques to calibrate confidence expressions. Empirical results show CritiCal outperforms Self-Critique and baselines, even beating GPT-4o on complex reasoning, and generalizes well to out-of-distribution data. The work also delineates what to critique (uncertainty for open-ended tasks, confidence for MC tasks) and how to critique (SFT with critique supervision, with DPO as an alternative), offering a scalable path to more reliable verbalized confidence in AI systems.

Abstract

Accurate confidence calibration in Large Language Models (LLMs) is critical for safe use in high-stakes domains, where clear verbalized confidence enhances user trust. Traditional methods that mimic reference confidence expressions often fail to capture the reasoning needed for accurate confidence assessment. We propose natural language critiques as a solution, ideally suited for confidence calibration, as precise gold confidence labels are hard to obtain and often require multiple generations. This paper studies how natural language critiques can enhance verbalized confidence, addressing: (1) What to critique: uncertainty (question-focused) or confidence (answer-specific)? Analysis shows confidence suits multiple-choice tasks, while uncertainty excels in open-ended scenarios. (2) How to critique: self-critique or critique calibration training? We propose Self-Critique, enabling LLMs to critique and optimize their confidence beyond mere accuracy, and CritiCal, a novel Critique Calibration training method that leverages natural language critiques to improve confidence calibration, moving beyond direct numerical optimization. Experiments show that CritiCal significantly outperforms Self-Critique and other competitive baselines, even surpassing its teacher model, GPT-4o, in complex reasoning tasks. CritiCal also shows robust generalization in out-of-distribution settings, advancing LLM's reliability.

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

TL;DR

Abstract

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)