Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?

Xu Hu; Yifan Zhang; Songtao Wei; Chen Zhao; Qiannan Li; Bingzhe Li; Feng Chen

Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?

Xu Hu, Yifan Zhang, Songtao Wei, Chen Zhao, Qiannan Li, Bingzhe Li, Feng Chen

TL;DR

This study systematically examines how parameter-efficient fine-tuning (PEFT) influences hallucination detection in large language models, testing LoRA, PiSSA, and DoRA across three open-weight backbones (LLaMA, Mistral, Qwen) and three QA benchmarks (TriviaQA, NQ-Open, SQuAD). It evaluates seven detectors spanning semantic-consistency, confidence-based, and entropy-based signals, plus white-box linear probes to probe hidden representations. The results show only modest improvements in QA accuracy with PEFT, but consistent and substantial gains in hallucination detection performance, particularly for semantic-consistency and confidence-based detectors, suggesting PEFT reshapes uncertainty signaling rather than injecting facts. PEFT appears to act as an epistemic regularizer, making model errors more detectable while sometimes disrupting linear-probe detectors, with PiSSA and DoRA offering task-specific advantages. These findings have practical implications for deploying uncertainty-aware LLM systems, safety tooling, and future research into robust probing methods that withstand fine-tuning-induced representational shifts.

Abstract

Parameter-efficient fine-tuning (PEFT) methods are widely used to adapt large language models (LLMs) to downstream tasks and are often assumed to improve factual correctness. However, how the parameter-efficient fine-tuning methods affect hallucination behavior remains insufficiently understood, especially on QA datasets. In this work, we systematically investigate the impact of PEFT on hallucination detection through a comprehensive empirical study across three open-weight LLM backbones and three fact-seeking QA benchmarks. For each model, we evaluate performance using seven unsupervised hallucination detection methods spanning three complementary approaches: semantic consistency based detectors, confidence based detectors, and entropy based detectors. This multifaceted evaluation enables us to characterize how PEFT reshapes uncertainty across different detection paradigms. In conclusion, our experimental results show that PEFT consistently strengthens hallucination detection ability, substantially improving AUROC across a wide range of hallucination detectors. Besides, further analyses using linear probes and representation diagnostics indicate that PEFT methods primarily reshapes how uncertainty is encoded and surfaced, comparing with injecting new factual knowledge into the models.

Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 13 figures, 8 tables)

This paper contains 28 sections, 2 equations, 13 figures, 8 tables.

Introduction
Related Work
Experiment
Experimental Setup
Parameter-efficient fine-tuning methods.
Black-box hallucination detectors.
White-box hallucination detector: linear probe.
Evaluation metrics.
Experimental Results
Takeaway #1: PEFT yields modest hallucination mitigation but significant hallucination detection improvement on QA datasets.
Takeaway#2: Semantic consistency based and confidence based hallucination detectors are improved significantly but entropy based detectors have marginal improvement after PEFT.
Takeaway# 3: PEFT improves the performance of hallucination detectors by shifting scores away from the overconfident regime.
Behavior Analysis
Statistical Analysis: Track Dangerous to Detectable Migration
Takeaway#4: PiSSA as the best safety protector. DoRA as the most effective knowledge corrector in open-domain QA while LoRA achieves the consistent and obvious great performance to rectify the dangers on SQuAD.
...and 13 more sections

Figures (13)

Figure 1: The overview of out empirical study of how hallucination detection ability is affeted by parameter-efficient fine-tuning.
Figure 2: Test accuracy across three backbones, three datasets, and four methods. Each panel shows a 3D waterfall visualization where the x-axis shows datasets, y-axis shows methods, and z-axis shows test accuracy (%). In this paper, we define the marginal when the changes are within 1%
Figure 3: Uncertainty score density distributions across PEFT methods on Qwen-NQ-Open. X-axis represents the uncertainty score.
Figure 4: Uncertainty-correctness quadrant.
Figure 5: Uncertainty-correctness analysis on NQ-Open using MSP (Llama-3.2-3B). Reliability increases from 65.11% (Base) to 72.74% (PiSSA), indicating that responses become obviously more trustworthy after PEFT. The danger ratio also decreases by 5% (PiSSA) at most.
...and 8 more figures

Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?

TL;DR

Abstract

Small Updates, Big Doubts: Does Parameter-Efficient Fine-tuning Enhance Hallucination Detection ?

Authors

TL;DR

Abstract

Table of Contents

Figures (13)