Table of Contents
Fetching ...

Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting

Guande He, Peng Cui, Jianfei Chen, Wenbo Hu, Jun Zhu

TL;DR

The paper tackles miscalibration in aligned language models under multiple-choice evaluation by isolating two sources of uncertainty—answer and format—and showing that standard alignment conflates them, driving overconfidence. It presents a formal decomposition of predictive uncertainty, analyzes how alignment stages affect each component, and demonstrates that in-context learning helps pre-trained LMs calibrate via format signaling while alignment disrupts this calibration. A practical contribution is a few-shot post-hoc calibration method that uses the pre-trained LM’s predictive distribution to calibrate aligned LMs, outperforming baseline temperature scaling and KDE approaches across tasks. The findings highlight the need for alignment processes that disentangle answer and formatting behaviors, with implications for safer, more reliable deployment of aligned LMs in decision-critical settings.

Abstract

Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on how aligned LMs differ in calibration from their pre-trained counterparts. Experimental results reveal that there are two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs, respectively. Then, we investigate the role of these two uncertainties on aligned LM's calibration through fine-tuning in simple synthetic alignment schemes and conclude that one reason for aligned LMs' overconfidence is the conflation of these two types of uncertainty. Furthermore, we examine the utility of common post-hoc calibration methods for aligned LMs and propose an easy-to-implement and sample-efficient method to calibrate aligned LMs. We hope our findings could provide insights into the design of more reliable alignment processes for LMs.

Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting

TL;DR

The paper tackles miscalibration in aligned language models under multiple-choice evaluation by isolating two sources of uncertainty—answer and format—and showing that standard alignment conflates them, driving overconfidence. It presents a formal decomposition of predictive uncertainty, analyzes how alignment stages affect each component, and demonstrates that in-context learning helps pre-trained LMs calibrate via format signaling while alignment disrupts this calibration. A practical contribution is a few-shot post-hoc calibration method that uses the pre-trained LM’s predictive distribution to calibrate aligned LMs, outperforming baseline temperature scaling and KDE approaches across tasks. The findings highlight the need for alignment processes that disentangle answer and formatting behaviors, with implications for safer, more reliable deployment of aligned LMs in decision-critical settings.

Abstract

Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on how aligned LMs differ in calibration from their pre-trained counterparts. Experimental results reveal that there are two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs, respectively. Then, we investigate the role of these two uncertainties on aligned LM's calibration through fine-tuning in simple synthetic alignment schemes and conclude that one reason for aligned LMs' overconfidence is the conflation of these two types of uncertainty. Furthermore, we examine the utility of common post-hoc calibration methods for aligned LMs and propose an easy-to-implement and sample-efficient method to calibrate aligned LMs. We hope our findings could provide insights into the design of more reliable alignment processes for LMs.
Paper Structure (38 sections, 9 equations, 12 figures, 3 tables)

This paper contains 38 sections, 9 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Reliable diagram and confidence histogram of Llama-1 and Vicuna-v1.3 (33B) on MMLU (5-shot).
  • Figure 2: An example MCQ prompt for MMLU (0-shot).
  • Figure 3: Averaged out-of-the-box calibration results across all datasets and choice formats.
  • Figure 4: The accuracy, ECE, and average predictive confidence of ZSL and ICL with choice format "A' and "(A)" on MMLU. We also report the sum of the choice letter's probabilities for choice format "A" and the probability of the format identifier for choice format "(A)".
  • Figure 5: ZSL results of different alignment stages on MMLU validation set.
  • ...and 7 more figures