Table of Contents
Fetching ...

Detecting AI Hallucinations in Finance: An Information-Theoretic Method Cuts Hallucination Rate by 92%

Mainak Singha

TL;DR

ECLIPSE introduces an entropy--capacity framework to detect LLM hallucinations in finance by explicitly modeling the mismatch between model uncertainty and evidence quality. It combines semantic entropy estimation with a novel perplexity decomposition that exposes how evidence is used, producing a logprob-native detector that operates with API access alone. A theoretical convexity guarantee under mild conditions supports stable interpretation, while empirical results on a controlled financial QA dataset show robust detection (AUC ~0.89) and strong benefit from perplexity features. The work demonstrates that hallucination risk can be tamed by accounting for evidence utilization and provides interpretable coefficients, though broader cross-domain validation and naturally occurring hallucinations remain for future work.

Abstract

Large language models (LLMs) produce fluent but unsupported answers - hallucinations - limiting safe deployment in high-stakes domains. We propose ECLIPSE, a framework that treats hallucination as a mismatch between a model's semantic entropy and the capacity of available evidence. We combine entropy estimation via multi-sample clustering with a novel perplexity decomposition that measures how models use retrieved evidence. We prove that under mild conditions, the resulting entropy-capacity objective is strictly convex with a unique stable optimum. We evaluate on a controlled financial question answering dataset with GPT-3.5-turbo (n=200 balanced samples with synthetic hallucinations), where ECLIPSE achieves ROC AUC of 0.89 and average precision of 0.90, substantially outperforming a semantic entropy-only baseline (AUC 0.50). A controlled ablation with Claude-3-Haiku, which lacks token-level log probabilities, shows AUC dropping to 0.59 with coefficient magnitudes decreasing by 95% - demonstrating that ECLIPSE is a logprob-native mechanism whose effectiveness depends on calibrated token-level uncertainties. The perplexity decomposition features exhibit the largest learned coefficients, confirming that evidence utilization is central to hallucination detection. We position this work as a controlled mechanism study; broader validation across domains and naturally occurring hallucinations remains future work.

Detecting AI Hallucinations in Finance: An Information-Theoretic Method Cuts Hallucination Rate by 92%

TL;DR

ECLIPSE introduces an entropy--capacity framework to detect LLM hallucinations in finance by explicitly modeling the mismatch between model uncertainty and evidence quality. It combines semantic entropy estimation with a novel perplexity decomposition that exposes how evidence is used, producing a logprob-native detector that operates with API access alone. A theoretical convexity guarantee under mild conditions supports stable interpretation, while empirical results on a controlled financial QA dataset show robust detection (AUC ~0.89) and strong benefit from perplexity features. The work demonstrates that hallucination risk can be tamed by accounting for evidence utilization and provides interpretable coefficients, though broader cross-domain validation and naturally occurring hallucinations remain for future work.

Abstract

Large language models (LLMs) produce fluent but unsupported answers - hallucinations - limiting safe deployment in high-stakes domains. We propose ECLIPSE, a framework that treats hallucination as a mismatch between a model's semantic entropy and the capacity of available evidence. We combine entropy estimation via multi-sample clustering with a novel perplexity decomposition that measures how models use retrieved evidence. We prove that under mild conditions, the resulting entropy-capacity objective is strictly convex with a unique stable optimum. We evaluate on a controlled financial question answering dataset with GPT-3.5-turbo (n=200 balanced samples with synthetic hallucinations), where ECLIPSE achieves ROC AUC of 0.89 and average precision of 0.90, substantially outperforming a semantic entropy-only baseline (AUC 0.50). A controlled ablation with Claude-3-Haiku, which lacks token-level log probabilities, shows AUC dropping to 0.59 with coefficient magnitudes decreasing by 95% - demonstrating that ECLIPSE is a logprob-native mechanism whose effectiveness depends on calibrated token-level uncertainties. The perplexity decomposition features exhibit the largest learned coefficients, confirming that evidence utilization is central to hallucination detection. We position this work as a controlled mechanism study; broader validation across domains and naturally occurring hallucinations remains future work.

Paper Structure

This paper contains 60 sections, 1 theorem, 22 equations, 7 figures, 9 tables.

Key Result

Theorem 4

If $\alpha > \lambda a^2 / 8$, then $\mathcal{L}_{\text{total}}(H \mid C, Q)$ is strictly convex in $H$, admits a unique global minimizer $H^*(C, Q)$, and gradient descent converges from any initialization.

Figures (7)

  • Figure 1: ROC curves for ECLIPSE and entropy-only baseline on our financial QA dataset. ECLIPSE achieves AUC of 0.89, substantially outperforming the entropy-only baseline (0.50). The shaded region indicates the area under the ECLIPSE curve.
  • Figure 2: Learned coefficients sorted by absolute magnitude. Green bars indicate coefficients matching theoretical predictions; the red bar ($p_{\max}$) shows an unexpected positive sign. The perplexity decomposition features ($L_{QE}$, ratio, $\Delta L$) dominate, confirming that evidence utilization drives detection.
  • Figure 3: Ablation study showing incremental AUC improvement as features are added. Entropy alone achieves 0.50; capacity adds +0.18; perplexity decomposition adds +0.21 more. The full model achieves 0.89, representing a 78% relative improvement over entropy-only detection.
  • Figure 4: Coverage vs hallucination rate for ECLIPSE and entropy-only baseline. At any given coverage level, ECLIPSE achieves substantially lower hallucination rates. At 30% coverage, ECLIPSE reduces hallucination rate by 92% relative to entropy-only detection (3.3% vs 43.3%).
  • Figure 5: Coefficient comparison between GPT-3.5-turbo (real log probabilities) and Claude-3-Haiku (estimated). Blue bars show GPT coefficients; orange bars show Claude coefficients. Red percentages indicate what fraction of the GPT magnitude is retained. Coefficients collapse by 90--96% when real log probabilities are unavailable, and $\Delta L$ flips sign, confirming that ECLIPSE is logprob-native.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Definition 1: Semantic entropy
  • Definition 2: Evidence capacity
  • Definition 3: Preferred entropy
  • Theorem 4: Stability and convexity