Hallucination Detection in LLMs: Fast and Memory-Efficient Fine-Tuned Models
Gabriel Y. Arteaga, Thomas B. Schön, Nicolas Pielawski
TL;DR
The paper tackles hallucination detection in LLMs under resource constraints by introducing a memory-efficient ensemble method that couples pre-trained weights with BatchEnsemble and LoRA adapters. Uncertainty signals from the ensemble feed a downstream binary classifier to detect both faithfulness and factual hallucinations, enabling fast training and inference on a single GPU. Empirical results show strong faithfulness detection (≈97.8%) and solid factual detection (≈68%), with practical advantages in time and memory, though out-of-distribution and some factual cases remain challenging. The approach offers a scalable path toward safer LLM deployment in high-stakes settings by combining efficient fine-tuning with uncertainty-based detection, and suggests avenues for richer uncertainty metrics and broader model evaluation in future work.
Abstract
Uncertainty estimation is a necessary component when implementing AI in high-risk settings, such as autonomous cars, medicine, or insurances. Large Language Models (LLMs) have seen a surge in popularity in recent years, but they are subject to hallucinations, which may cause serious harm in high-risk settings. Despite their success, LLMs are expensive to train and run: they need a large amount of computations and memory, preventing the use of ensembling methods in practice. In this work, we present a novel method that allows for fast and memory-friendly training of LLM ensembles. We show that the resulting ensembles can detect hallucinations and are a viable approach in practice as only one GPU is needed for training and inference.
