C-LoRA: Contextual Low-Rank Adaptation for Uncertainty Estimation in Large Language Models
Amir Hossein Rahmati, Sanket Jantre, Weifeng Zhang, Yucheng Wang, Byung-Jun Yoon, Nathan M. Urban, Xiaoning Qian
TL;DR
Fine-tuning large language models with LoRA can yield overconfident predictions in low-data regimes. The authors introduce Contextual Low-Rank Adaptation (C-LoRA), a Bayesian, parameter-efficient approach that makes the LoRA adapters context-dependent via lightweight per-layer modules and a data-driven low-rank factorization, enabling sample-specific uncertainty estimates. Across six reasoning tasks with LLaMA2-7B, C-LoRA delivers well-calibrated uncertainty (low ECE and NLL) and robust generalization, often outperforming state-of-the-art uncertainty-aware LoRA methods, with ablations confirming the importance of the contextual module. While the method scales conceptually beyond 7B, the paper also discusses limitations and future directions, including scaling to larger models, multimodal settings, and active-learning applications.
Abstract
Low-Rank Adaptation (LoRA) offers a cost-effective solution for fine-tuning large language models (LLMs), but it often produces overconfident predictions in data-scarce few-shot settings. To address this issue, several classical statistical learning approaches have been repurposed for scalable uncertainty-aware LoRA fine-tuning. However, these approaches neglect how input characteristics affect the predictive uncertainty estimates. To address this limitation, we propose Contextual Low-Rank Adaptation (C-LoRA) as a novel uncertainty-aware and parameter efficient fine-tuning approach, by developing new lightweight LoRA modules contextualized to each input data sample to dynamically adapt uncertainty estimates. Incorporating data-driven contexts into the parameter posteriors, C-LoRA mitigates overfitting, achieves well-calibrated uncertainties, and yields robust predictions. Extensive experiments on LLaMA2-7B models demonstrate that C-LoRA consistently outperforms the state-of-the-art uncertainty-aware LoRA methods in both uncertainty quantification and model generalization. Ablation studies further confirm the critical role of our contextual modules in capturing sample-specific uncertainties. C-LoRA sets a new standard for robust, uncertainty-aware LLM fine-tuning in few-shot regimes. Although our experiments are limited to 7B models, our method is architecture-agnostic and, in principle, applies beyond this scale; studying its scaling to larger models remains an open problem. Our code is available at https://github.com/ahra99/c_lora.
