Table of Contents
Fetching ...

SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection

Mengya Hu, Rui Xu, Deren Lei, Yaxi Li, Mingyu Wang, Emily Ching, Eslam Kamal, Alex Deng

TL;DR

This work tackles real-time hallucination detection by combining a small language model (SLM) for fast initial judgments with a constrained, large language model (LLM) to generate explanations. It introduces a categorized prompting strategy and a downstream consistency analysis to align LLM explanations with SLM decisions, addressing potential inconsistencies. Empirical results on four open-source datasets show that categorizing inconsistencies and applying filtering substantially improve alignment and yield meaningful feedback for refining the SLM. The framework offers a practical path toward latency-aware, interpretable hallucination detection and demonstrates how LLM-based explanations can inform iterative improvement of smaller detectors.

Abstract

Large language models (LLMs) are highly capable but face latency challenges in real-time applications, such as conducting online hallucination detection. To overcome this issue, we propose a novel framework that leverages a small language model (SLM) classifier for initial detection, followed by a LLM as constrained reasoner to generate detailed explanations for detected hallucinated content. This study optimizes the real-time interpretable hallucination detection by introducing effective prompting techniques that align LLM-generated explanations with SLM decisions. Empirical experiment results demonstrate its effectiveness, thereby enhancing the overall user experience.

SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection

TL;DR

This work tackles real-time hallucination detection by combining a small language model (SLM) for fast initial judgments with a constrained, large language model (LLM) to generate explanations. It introduces a categorized prompting strategy and a downstream consistency analysis to align LLM explanations with SLM decisions, addressing potential inconsistencies. Empirical results on four open-source datasets show that categorizing inconsistencies and applying filtering substantially improve alignment and yield meaningful feedback for refining the SLM. The framework offers a practical path toward latency-aware, interpretable hallucination detection and demonstrates how LLM-based explanations can inform iterative improvement of smaller detectors.

Abstract

Large language models (LLMs) are highly capable but face latency challenges in real-time applications, such as conducting online hallucination detection. To overcome this issue, we propose a novel framework that leverages a small language model (SLM) classifier for initial detection, followed by a LLM as constrained reasoner to generate detailed explanations for detected hallucinated content. This study optimizes the real-time interpretable hallucination detection by introducing effective prompting techniques that align LLM-generated explanations with SLM decisions. Empirical experiment results demonstrate its effectiveness, thereby enhancing the overall user experience.
Paper Structure (12 sections, 5 figures, 4 tables)

This paper contains 12 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Hallucination detection with LLM as constrained reasoner: Grounding sources and hypothesis pairs are input into a SLM classifier. In most cases, if no hallucination is detected, the no hallucination decision will be returned to the client directly. However, if a hallucination is detected by SLM, an LLM-based constrained reasoner is employed to interpret the SLM's decision. If the reasoner's analysis aligns with the initial hallucination detection, this information, along with the original hypothesis, is relayed to the client. Otherwise, the potentially problematic hypothesis is filtered out or used as valuable feedback to further refine and improve the upstream SLM.
  • Figure 2: Inconsistency rate comparison: Categorized approach consistently outperforms both the Vanilla and Fallback methods with significant drop in inconsistency after applying filtering.
  • Figure 3: Vanilla prompt.
  • Figure 4: Fallback Prompt.
  • Figure 5: Categorized Prompt.