Table of Contents
Fetching ...

SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs

Samir Abdaljalil, Hasan Kurban, Parichit Sharma, Erchin Serpedin, Rachad Atat

TL;DR

<3-5 sentence high-level summary> The paper tackles the problem of hallucinations in large language models by introducing SINdex, a scalable, black-box framework that detects inconsistencies in model outputs without access to internal states. It combines semantic clustering of multiple responses using sentence embeddings and hierarchical agglomerative clustering with an innovative SINdex measure to quantify semantic inconsistency within clusters. Empirical results across TriviaQA, Natural Questions, SQuAD, and BioASQ show that SINdex improves AUROC over state-of-the-art baselines by up to 9.3%, with ablation analyses highlighting the contributions of embedding choice, clustering method, and hyperparameters. The approach offers a practical, model-agnostic tool for improving the reliability of LLM outputs in QA tasks and has potential for broader NLG applications and efficiency gains in large-scale deployments.

Abstract

Large language models (LLMs) are increasingly deployed across diverse domains, yet they are prone to generating factually incorrect outputs - commonly known as "hallucinations." Among existing mitigation strategies, uncertainty-based methods are particularly attractive due to their ease of implementation, independence from external data, and compatibility with standard LLMs. In this work, we introduce a novel and scalable uncertainty-based semantic clustering framework for automated hallucination detection. Our approach leverages sentence embeddings and hierarchical clustering alongside a newly proposed inconsistency measure, SINdex, to yield more homogeneous clusters and more accurate detection of hallucination phenomena across various LLMs. Evaluations on prominent open- and closed-book QA datasets demonstrate that our method achieves AUROC improvements of up to 9.3% over state-of-the-art techniques. Extensive ablation studies further validate the effectiveness of each component in our framework.

SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs

TL;DR

<3-5 sentence high-level summary> The paper tackles the problem of hallucinations in large language models by introducing SINdex, a scalable, black-box framework that detects inconsistencies in model outputs without access to internal states. It combines semantic clustering of multiple responses using sentence embeddings and hierarchical agglomerative clustering with an innovative SINdex measure to quantify semantic inconsistency within clusters. Empirical results across TriviaQA, Natural Questions, SQuAD, and BioASQ show that SINdex improves AUROC over state-of-the-art baselines by up to 9.3%, with ablation analyses highlighting the contributions of embedding choice, clustering method, and hyperparameters. The approach offers a practical, model-agnostic tool for improving the reliability of LLM outputs in QA tasks and has potential for broader NLG applications and efficiency gains in large-scale deployments.

Abstract

Large language models (LLMs) are increasingly deployed across diverse domains, yet they are prone to generating factually incorrect outputs - commonly known as "hallucinations." Among existing mitigation strategies, uncertainty-based methods are particularly attractive due to their ease of implementation, independence from external data, and compatibility with standard LLMs. In this work, we introduce a novel and scalable uncertainty-based semantic clustering framework for automated hallucination detection. Our approach leverages sentence embeddings and hierarchical clustering alongside a newly proposed inconsistency measure, SINdex, to yield more homogeneous clusters and more accurate detection of hallucination phenomena across various LLMs. Evaluations on prominent open- and closed-book QA datasets demonstrate that our method achieves AUROC improvements of up to 9.3% over state-of-the-art techniques. Extensive ablation studies further validate the effectiveness of each component in our framework.

Paper Structure

This paper contains 40 sections, 18 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Illustration of the proposed Natural Language Generation hallucination detection framework, which leverages an optimized semantic clustering approach to compute semantic entropy. Part 1 involves generating multiple responses to the same question. Part 2 processes these responses by computing sentence embeddings and clustering them via hierarchical agglomerative clustering. Part 3 calculates the SINdex measure using an adjusted probability computation derived from the resulting clusters.
  • Figure 2: Visualization of clusters obtained through hierarchical agglomerative clustering (left) and NLI-based clustering (right) for the same sample. Dashed squares denote the clusters identified by each method.
  • Figure 3: Ablation experiments evaluating the effect of the number of initial generations on LLaMa-2-13b-chat.
  • Figure 4: Ablation experiments on LLaMa-2-13b-chat evaluating the sensitivity of the cosine similarity threshold for semantic clustering across all datasets.
  • Figure 5: Runtime Analysis of NLI and agglomerative clustering over varying number of generations.
  • ...and 3 more figures