Table of Contents
Fetching ...

ADVICE: Answer-Dependent Verbalized Confidence Estimation

Ki Jung Seo, Sehun Lim, Taeuk Kim

TL;DR

The paper addresses overconfidence in verbalized confidence from large language models by identifying answer-independence as the key shortcoming. It introduces ADVICE, a fine-tuning framework that explicitly grounds confidence estimation in the model's answer through a contrastive, answer-aware objective combining $L_{LM}$, $L_{JSD}$, and $L_{Margin}$. Across open-weight LLMs and multiple QA benchmarks, ADVICE achieves well-calibrated confidence (lower $ECE$ and $NCE$) while preserving task performance and generalizing to out-of-distribution data, with more balanced confidence distributions. The work provides mechanistic insights into confidence verbalization and offers a practical approach to safer, more trustworthy verbalized confidence in LLMs.

Abstract

Recent progress in large language models (LLMs) has enabled them to express their confidence in natural language, enhancing transparency and reliability. However, their confidence often exhibits overconfidence, the cause of which remains poorly understood. In this work, we conduct a detailed analysis of the dynamics underlying verbalized confidence and identify answer-independence as a key factor, defined as the model's failure to condition confidence on its own answer. To address this, we propose ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that facilitates answer-grounded confidence estimation. Extensive experiments show that ADVICE substantially improves confidence calibration while preserving task performance. Further analyses confirm that ADVICE strengthens answer-groundedness, leading to more balanced and well-calibrated confidence distributions. Our findings shed light on the origin of overconfidence and establish a framework for more trustworthy confidence verbalization.

ADVICE: Answer-Dependent Verbalized Confidence Estimation

TL;DR

The paper addresses overconfidence in verbalized confidence from large language models by identifying answer-independence as the key shortcoming. It introduces ADVICE, a fine-tuning framework that explicitly grounds confidence estimation in the model's answer through a contrastive, answer-aware objective combining , , and . Across open-weight LLMs and multiple QA benchmarks, ADVICE achieves well-calibrated confidence (lower and ) while preserving task performance and generalizing to out-of-distribution data, with more balanced confidence distributions. The work provides mechanistic insights into confidence verbalization and offers a practical approach to safer, more trustworthy verbalized confidence in LLMs.

Abstract

Recent progress in large language models (LLMs) has enabled them to express their confidence in natural language, enhancing transparency and reliability. However, their confidence often exhibits overconfidence, the cause of which remains poorly understood. In this work, we conduct a detailed analysis of the dynamics underlying verbalized confidence and identify answer-independence as a key factor, defined as the model's failure to condition confidence on its own answer. To address this, we propose ADVICE (Answer-Dependent Verbalized Confidence Estimation), a fine-tuning framework that facilitates answer-grounded confidence estimation. Extensive experiments show that ADVICE substantially improves confidence calibration while preserving task performance. Further analyses confirm that ADVICE strengthens answer-groundedness, leading to more balanced and well-calibrated confidence distributions. Our findings shed light on the origin of overconfidence and establish a framework for more trustworthy confidence verbalization.

Paper Structure

This paper contains 41 sections, 13 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: LLMs tend to verbalize their overconfidence irrespective of whether their answers are correct. We propose a fine-tuning method to mitigate this problem, achieving well-calibrated verbalized confidence.
  • Figure 2: The illustration of the ADVICE (Answer- Dependent VerbalIzed Confidence Estimation) framework.
  • Figure 3: Distributions of JSD values comparing confidence predictions with and without answers. The two distributions for both models are peaked around zero, implying limited use of answer information.
  • Figure 4: Comparison of Attention Rollout scores on three attention directions: (1) Answer to Question, (2) Confidence to Question, and (3) Confidence to Answer.
  • Figure 5: Reliability diagrams on TriviaQA with Gemma-2-9b-it under the ScoreNumber setting. ADVICE achieves high calibration quality comparable to ConfTuner, outperforming Default and Self-Consistency.
  • ...and 6 more figures