Table of Contents
Fetching ...

Hallucination Detection in LLMs: Fast and Memory-Efficient Fine-Tuned Models

Gabriel Y. Arteaga, Thomas B. Schön, Nicolas Pielawski

TL;DR

The paper tackles hallucination detection in LLMs under resource constraints by introducing a memory-efficient ensemble method that couples pre-trained weights with BatchEnsemble and LoRA adapters. Uncertainty signals from the ensemble feed a downstream binary classifier to detect both faithfulness and factual hallucinations, enabling fast training and inference on a single GPU. Empirical results show strong faithfulness detection (≈97.8%) and solid factual detection (≈68%), with practical advantages in time and memory, though out-of-distribution and some factual cases remain challenging. The approach offers a scalable path toward safer LLM deployment in high-stakes settings by combining efficient fine-tuning with uncertainty-based detection, and suggests avenues for richer uncertainty metrics and broader model evaluation in future work.

Abstract

Uncertainty estimation is a necessary component when implementing AI in high-risk settings, such as autonomous cars, medicine, or insurances. Large Language Models (LLMs) have seen a surge in popularity in recent years, but they are subject to hallucinations, which may cause serious harm in high-risk settings. Despite their success, LLMs are expensive to train and run: they need a large amount of computations and memory, preventing the use of ensembling methods in practice. In this work, we present a novel method that allows for fast and memory-friendly training of LLM ensembles. We show that the resulting ensembles can detect hallucinations and are a viable approach in practice as only one GPU is needed for training and inference.

Hallucination Detection in LLMs: Fast and Memory-Efficient Fine-Tuned Models

TL;DR

The paper tackles hallucination detection in LLMs under resource constraints by introducing a memory-efficient ensemble method that couples pre-trained weights with BatchEnsemble and LoRA adapters. Uncertainty signals from the ensemble feed a downstream binary classifier to detect both faithfulness and factual hallucinations, enabling fast training and inference on a single GPU. Empirical results show strong faithfulness detection (≈97.8%) and solid factual detection (≈68%), with practical advantages in time and memory, though out-of-distribution and some factual cases remain challenging. The approach offers a scalable path toward safer LLM deployment in high-stakes settings by combining efficient fine-tuning with uncertainty-based detection, and suggests avenues for richer uncertainty metrics and broader model evaluation in future work.

Abstract

Uncertainty estimation is a necessary component when implementing AI in high-risk settings, such as autonomous cars, medicine, or insurances. Large Language Models (LLMs) have seen a surge in popularity in recent years, but they are subject to hallucinations, which may cause serious harm in high-risk settings. Despite their success, LLMs are expensive to train and run: they need a large amount of computations and memory, preventing the use of ensembling methods in practice. In this work, we present a novel method that allows for fast and memory-friendly training of LLM ensembles. We show that the resulting ensembles can detect hallucinations and are a viable approach in practice as only one GPU is needed for training and inference.
Paper Structure (15 sections, 4 equations, 3 figures, 9 tables)

This paper contains 15 sections, 4 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: (Left) The ensemble utilizes a shared matrix of pre-trained “slow weights” $U$, which are updated with LoRA matrices ($BA$) during training and then merged. Each ensemble member is represented by an individual rank-one matrix (fast weights $V$) that is combined with the shared weights using a Hadamard product. (Right) The ensemble generates uncertainty estimates, which serve as features for a classifier to determine whether the LLM's prediction is correct or hallucinated.
  • Figure 2: (left) The average time for the models' to output a token, as the ensemble size increases the BatchEnsemble becomes increasingly faster in inference compared to the baseline. (right) Trainable parameters increase linearly with ensemble size for Vanilla ensemble lakshminarayanan2017simple, while BatchEnsemble wenbatchensemble shows negligible increase.
  • Figure C.1: a) He initialization with $\mu=1$ b) He initialization c) Xavier initialization with $\mu=1$ d) Xavier initialization