Table of Contents
Fetching ...

On the Calibration of Large Language Models and Alignment

Chiwei Zhu, Benfeng Xu, Quan Wang, Yongdong Zhang, Zhendong Mao

TL;DR

<3-5 sentence high-level summary>Calibration of large language models is crucial for reliability, particularly in high-stakes domains. This work systematically studies calibration across the full lifecycle of aligned LLMs—pretraining and alignment—using three evaluation tasks (CLM, facts generation, and multi-task understanding) and formal metrics (reliability diagrams and ECE). It finds that larger parameter scales and longer pretraining dynamics improve calibration, while instruction tuning during alignment often harms calibration, though parameter-efficient tuning and RLHF can mitigate or stabilize it. The results offer practical guidance for building more factual, trustworthy assistants and highlight the role of data diversity in instruction data construction.

Abstract

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of deep models, serves as a crucial tool for assessing and improving their reliability. However, such investigation has been comparatively underexplored. In this work, we conduct a systematic examination of the calibration of aligned language models throughout the entire construction process, including pretraining and alignment training. At each stage, we investigate how different training settings, such as parameter scales and training data, affect model calibration. To thoroughly assess model calibration, we evaluate models on three most concerned aspects: generation, factuality and understanding. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.

On the Calibration of Large Language Models and Alignment

TL;DR

<3-5 sentence high-level summary>Calibration of large language models is crucial for reliability, particularly in high-stakes domains. This work systematically studies calibration across the full lifecycle of aligned LLMs—pretraining and alignment—using three evaluation tasks (CLM, facts generation, and multi-task understanding) and formal metrics (reliability diagrams and ECE). It finds that larger parameter scales and longer pretraining dynamics improve calibration, while instruction tuning during alignment often harms calibration, though parameter-efficient tuning and RLHF can mitigate or stabilize it. The results offer practical guidance for building more factual, trustworthy assistants and highlight the role of data diversity in instruction data construction.

Abstract

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of deep models, serves as a crucial tool for assessing and improving their reliability. However, such investigation has been comparatively underexplored. In this work, we conduct a systematic examination of the calibration of aligned language models throughout the entire construction process, including pretraining and alignment training. At each stage, we investigate how different training settings, such as parameter scales and training data, affect model calibration. To thoroughly assess model calibration, we evaluate models on three most concerned aspects: generation, factuality and understanding. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
Paper Structure (35 sections, 4 equations, 12 figures, 3 tables)

This paper contains 35 sections, 4 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Scope of investigations in this paper.
  • Figure 2: Reliability diagram for a Pythia-70m model.
  • Figure 3: Model calibration of different parameter scales.
  • Figure 4: Model calibration of different training dynamics.
  • Figure 5: Model calibration using different alignment training settings.
  • ...and 7 more figures