On the Calibration of Large Language Models and Alignment

Chiwei Zhu; Benfeng Xu; Quan Wang; Yongdong Zhang; Zhendong Mao

On the Calibration of Large Language Models and Alignment

Chiwei Zhu, Benfeng Xu, Quan Wang, Yongdong Zhang, Zhendong Mao

TL;DR

<3-5 sentence high-level summary>Calibration of large language models is crucial for reliability, particularly in high-stakes domains. This work systematically studies calibration across the full lifecycle of aligned LLMs—pretraining and alignment—using three evaluation tasks (CLM, facts generation, and multi-task understanding) and formal metrics (reliability diagrams and ECE). It finds that larger parameter scales and longer pretraining dynamics improve calibration, while instruction tuning during alignment often harms calibration, though parameter-efficient tuning and RLHF can mitigate or stabilize it. The results offer practical guidance for building more factual, trustworthy assistants and highlight the role of data diversity in instruction data construction.

Abstract

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of deep models, serves as a crucial tool for assessing and improving their reliability. However, such investigation has been comparatively underexplored. In this work, we conduct a systematic examination of the calibration of aligned language models throughout the entire construction process, including pretraining and alignment training. At each stage, we investigate how different training settings, such as parameter scales and training data, affect model calibration. To thoroughly assess model calibration, we evaluate models on three most concerned aspects: generation, factuality and understanding. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.

On the Calibration of Large Language Models and Alignment

TL;DR

Abstract

Paper Structure (35 sections, 4 equations, 12 figures, 3 tables)

This paper contains 35 sections, 4 equations, 12 figures, 3 tables.

Introduction
For pretraining of LLMs:
For alignment of LLMs:
For different tasks:
Related Work
Confidence calibration
Analysis of large language models.
Definitions
Confidence Calibration.
Reliability Diagram
Expected Calibration Error (ECE).
Calibration Evaluation Tasks and Data
Causal Language Modeling
Facts Generation
Multi-task Language Understanding
...and 20 more sections

Figures (12)

Figure 1: Scope of investigations in this paper.
Figure 2: Reliability diagram for a Pythia-70m model.
Figure 3: Model calibration of different parameter scales.
Figure 4: Model calibration of different training dynamics.
Figure 5: Model calibration using different alignment training settings.
...and 7 more figures

On the Calibration of Large Language Models and Alignment

TL;DR

Abstract

On the Calibration of Large Language Models and Alignment

Authors

TL;DR

Abstract

Table of Contents

Figures (12)