Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer
Haoyan Yang, Yixuan Wang, Xingyin Xu, Hanyuan Zhang, Yirong Bian
TL;DR
This work addresses the overconfidence bias in large language models by introducing a knowledge-transfer framework that leverages chain-of-thought reasoning. Large LLMs generate detailed CoTs and confidence signals, which are used to fine-tune smaller LLMs so they replicate advanced reasoning with calibrated confidence via Confidence-Calibrated Inference. Across multilingual tasks of multiple-choice and sentiment analysis, the KT approach substantially improves accuracy and calibration over vanilla and QA baselines, with notable gains on TruthfulQA and related datasets. The method demonstrates that transferring structured reasoning from big to small models can yield trustworthy, context-appropriate outputs, albeit with some limitations such as potential token inflation and self-dialogue risks.
Abstract
The study explores mitigating overconfidence bias in LLMs to improve their reliability. We introduce a knowledge transfer (KT) method utilizing chain of thoughts, where "big" LLMs impart knowledge to "small" LLMs via detailed, sequential reasoning paths. This method uses advanced reasoning of larger models to fine-tune smaller models, enabling them to produce more accurate predictions with calibrated confidence. Experimental evaluation using multiple-choice questions and sentiment analysis across diverse datasets demonstrated the KT method's superiority over the vanilla and question-answer pair (QA) fine-tuning methods. The most significant improvement in three key metrics, where the KT method outperformed the vanilla and QA methods by an average of 55.3% and 43.1%, respectively. These findings underscore the KT method's potential in enhancing model trustworthiness and accuracy, offering precise outputs with well-matched confidence levels across various contexts.
