Uncertainty quantification in fine-tuned LLMs using LoRA ensembles
Oleksandr Balabanov, Hampus Linander
TL;DR
This work treats fine-tuning of large language models as a Bayesian update around a pre-trained prior and develops principled posterior approximations using ensembles of low-rank adapters (LoRA). By analyzing predictive entropy $H(t^*|s^*,\mathcal{D})$ and mutual information $\text{MI}(\theta,t^*|s^*,\mathcal{D})$ across LoRA ensemble members, the authors quantify epistemic versus aleatoric uncertainty and track how knowledge from the pre-trained model is retained or replaced during domain adaptation. They implement LoRA deep ensembles on Mistral-7B fine-tuned with CommonsenseQA and evaluate on CQA, MMLU STEM, and MMLU SS, showing that small ensemble sizes ($M=5$) yield nearly as good posterior quality as larger ones while reducing overfitting and enabling detection of uncertain predictions. The findings reveal distinct uncertainty dynamics across in-domain and out-of-domain data, including unexpected retention of acquired knowledge in the overfitting regime, and demonstrate a practical framework for uncertainty-aware deployment and active learning in fine-tuned LLMs.
Abstract
Fine-tuning large language models can improve task specific performance, although a general understanding of what the fine-tuned model has learned, forgotten and how to trust its predictions is still missing. We derive principled uncertainty quantification for fine-tuned LLMs with posterior approximations using computationally efficient low-rank adaptation ensembles. We analyze three common multiple-choice datasets using low-rank adaptation ensembles based on Mistral-7b, and draw quantitative and qualitative conclusions on their perceived complexity and balance between retained prior knowledge and domain specific adaptation during and after fine-tuning. We identify unexpected retention of acquired knowledge during fine-tuning in the overfitting regime.
