On the Calibration of Large Language Models and Alignment
Chiwei Zhu, Benfeng Xu, Quan Wang, Yongdong Zhang, Zhendong Mao
TL;DR
<3-5 sentence high-level summary>Calibration of large language models is crucial for reliability, particularly in high-stakes domains. This work systematically studies calibration across the full lifecycle of aligned LLMs—pretraining and alignment—using three evaluation tasks (CLM, facts generation, and multi-task understanding) and formal metrics (reliability diagrams and ECE). It finds that larger parameter scales and longer pretraining dynamics improve calibration, while instruction tuning during alignment often harms calibration, though parameter-efficient tuning and RLHF can mitigate or stabilize it. The results offer practical guidance for building more factual, trustworthy assistants and highlight the role of data diversity in instruction data construction.
Abstract
As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of deep models, serves as a crucial tool for assessing and improving their reliability. However, such investigation has been comparatively underexplored. In this work, we conduct a systematic examination of the calibration of aligned language models throughout the entire construction process, including pretraining and alignment training. At each stage, we investigate how different training settings, such as parameter scales and training data, affect model calibration. To thoroughly assess model calibration, we evaluate models on three most concerned aspects: generation, factuality and understanding. Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
