FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

KaShun Shum; Minrui Xu; Jianshu Zhang; Zixin Chen; Shizhe Diao; Hanze Dong; Jipeng Zhang; Muhammad Omer Raza

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

KaShun Shum, Minrui Xu, Jianshu Zhang, Zixin Chen, Shizhe Diao, Hanze Dong, Jipeng Zhang, Muhammad Omer Raza

TL;DR

A brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher’s knowledge to obtain a reliable language model in a cost-efficient way and applies a “trustworthy maximization” process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student.

Abstract

Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy -- - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to "tuning-induced mis-calibration". In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way. Specifically, we identify the "concentrated knowledge" phenomenon during distillation, which can significantly reduce the computational burden. Then we apply a "trustworthy maximization" process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less mis-calibration (-10%) are achieved on average across both in-domain and out-of-domain scenarios, indicating better trustworthiness.

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

TL;DR

Abstract

Paper Structure (32 sections, 6 equations, 6 figures, 3 tables)

This paper contains 32 sections, 6 equations, 6 figures, 3 tables.

Introduction
Related Work
Trustworthy Models
Knowledge Distillation
Preliminaries
Concentrated Knowledge
Tuning-induced Mis-calibration
Expected Calibration Error
Trustworthy Score
Efficient Trustworthy Distillation
Efficient Knowledge Selection
Trustworthy Maximization
Label Smoothing:
Temperature Scaling:
Knowledge Matching
...and 17 more sections

Figures (6)

Figure 1: A trustworthy model should be both accurate (left) and well-calibrated (right). A well-calibrated model should produce high probabilities for the correct answer and low probabilities for the wrong answer.
Figure 2: The blue line with range shows the averaged accumulated probability coverage for each token entry, from Top-1 to Top-100. "Concentrated Knowledge" : The red point represents accumulated probability for Top-5 tokens already exceed 95%. The green line describes the disk usage if use Top-K token distribution during distillation.
Figure 3: "Tuning-induced Mis-calibration" : Position-wise prediction probabilities with corresponding actual accuracy of (a) fine-tuned teacher model and (b) fine-tuned small model, (c) distilled model and (d) model produced by FIRST.
Figure 4: (a) The Trustworthy Maximization Step: we first fine-tune our the teacher model and then generate top-5 probabilities of all tokens and run a grid search to select the optimal temperature based on the validation set. (b) The overall Efficient Trustworthy Distillation Pipeline: based on the selected optimal temperature from (a), we obtain a well-calibrated student model by knowledge matching between student's knowledge and the portion of teacher knowledge.
Figure 5: Reliability diagrams based on Llama-1 reveal the mis-calibration of various models on the CSQA dataset. In these diagrams, the X-axis is confidence divided into 10 bins, representing the model's confidence levels for each question's answer tokens. The Y-axis represents the accuracy within each bin. The red bar represents the degree to which the actual accuracy is higher than perfect calibration (under-confident), while the green bar means that the actual accuracy is lower than perfect calibration (over-confident).
...and 1 more figures

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

TL;DR

Abstract

FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)