Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?
Xu Hu, Yifan Zhang, Songtao Wei, Chen Zhao, Qiannan Li, Bingzhe Li, Feng Chen
TL;DR
This study systematically examines how parameter-efficient fine-tuning (PEFT) influences hallucination detection in large language models, testing LoRA, PiSSA, and DoRA across three open-weight backbones (LLaMA, Mistral, Qwen) and three QA benchmarks (TriviaQA, NQ-Open, SQuAD). It evaluates seven detectors spanning semantic-consistency, confidence-based, and entropy-based signals, plus white-box linear probes to probe hidden representations. The results show only modest improvements in QA accuracy with PEFT, but consistent and substantial gains in hallucination detection performance, particularly for semantic-consistency and confidence-based detectors, suggesting PEFT reshapes uncertainty signaling rather than injecting facts. PEFT appears to act as an epistemic regularizer, making model errors more detectable while sometimes disrupting linear-probe detectors, with PiSSA and DoRA offering task-specific advantages. These findings have practical implications for deploying uncertainty-aware LLM systems, safety tooling, and future research into robust probing methods that withstand fine-tuning-induced representational shifts.
Abstract
Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt large language models (LLMs) to downstream tasks and are often assumed to improve factual correctness. However, how the parameter-efficient fine-tuning methods affect hallucination behavior remains insufficiently understood, especially on QA datasets. In this work, we systematically investigate the impact of PEFT on hallucination detection through a comprehensive empirical study across three open-weight LLM backbones and three fact-seeking QA benchmarks. For each model, we evaluate performance using seven unsupervised hallucination detection methods spanning three complementary approaches: semantic consistency based detectors, confidence based detectors, and entropy based detectors. This multifaceted evaluation enables us to characterize how PEFT reshapes uncertainty across different detection paradigms. In conclusion, our experimental results show that PEFT consistently strengthens hallucination detection ability, substantially improving AUROC across a wide range of hallucination detectors. Besides, further analyses using linear probes and representation diagnostics indicate that PEFT methods primarily reshapes how uncertainty is encoded and surfaced, comparing with injecting new factual knowledge into the models.
