Table of Contents
Fetching ...

Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency

Aman Goel, Daniel Schwartz, Yanjun Qi

TL;DR

Finch-Zk presents an integrated, zero-knowledge framework that combines cross-model consistency with a fine-grained, multi-stage mitigation pipeline to detect and correct LLM hallucinations without external knowledge sources. It leverages diverse prompt variations and cross-architecture sampling to reveal fine-grained inaccuracies, followed by targeted block-level corrections and cross-model reflection to preserve accurate content and improve overall factuality. Empirical results on FELM show robust detection gains (6–39% in F1) and GPQA-diamond mitigation improvements up to 9 percentage points in answer accuracy, underscoring practical applicability in production environments. While offering deployment-ready capabilities and strong empirical support, the approach incurs higher latency and costs and relies on cross-model consensus, suggesting the continued need for human oversight in high-stakes settings.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, but they remain susceptible to hallucinations--generating content that appears plausible but contains factual inaccuracies. We present Finch-Zk, a black-box framework that leverages fine-grained cross-model consistency to detect and mitigate hallucinations in LLM outputs without requiring external knowledge sources. Finch-Zk introduces two key innovations: 1) a cross-model consistency checking strategy that reveals fine-grained inaccuracies by comparing responses generated by diverse models from semantically-equivalent prompts, and 2) a targeted mitigation technique that applies precise corrections to problematic segments while preserving accurate content. Experiments on the FELM dataset show Finch-Zk improves hallucination detection F1 scores by 6-39\% compared to existing approaches. For mitigation, Finch-Zk achieves up to 9 absolute percentage points improvement in answer accuracy on the GPQA-diamond dataset when applied to state-of-the-art models like Llama 4 Maverick and Claude 4 Sonnet. Extensive evaluation on multiple datasets demonstrates that Finch-Zk provides a practical, deployment-ready safeguard for enhancing factual reliability in production LLM systems.

Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency

TL;DR

Finch-Zk presents an integrated, zero-knowledge framework that combines cross-model consistency with a fine-grained, multi-stage mitigation pipeline to detect and correct LLM hallucinations without external knowledge sources. It leverages diverse prompt variations and cross-architecture sampling to reveal fine-grained inaccuracies, followed by targeted block-level corrections and cross-model reflection to preserve accurate content and improve overall factuality. Empirical results on FELM show robust detection gains (6–39% in F1) and GPQA-diamond mitigation improvements up to 9 percentage points in answer accuracy, underscoring practical applicability in production environments. While offering deployment-ready capabilities and strong empirical support, the approach incurs higher latency and costs and relies on cross-model consensus, suggesting the continued need for human oversight in high-stakes settings.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, but they remain susceptible to hallucinations--generating content that appears plausible but contains factual inaccuracies. We present Finch-Zk, a black-box framework that leverages fine-grained cross-model consistency to detect and mitigate hallucinations in LLM outputs without requiring external knowledge sources. Finch-Zk introduces two key innovations: 1) a cross-model consistency checking strategy that reveals fine-grained inaccuracies by comparing responses generated by diverse models from semantically-equivalent prompts, and 2) a targeted mitigation technique that applies precise corrections to problematic segments while preserving accurate content. Experiments on the FELM dataset show Finch-Zk improves hallucination detection F1 scores by 6-39\% compared to existing approaches. For mitigation, Finch-Zk achieves up to 9 absolute percentage points improvement in answer accuracy on the GPQA-diamond dataset when applied to state-of-the-art models like Llama 4 Maverick and Claude 4 Sonnet. Extensive evaluation on multiple datasets demonstrates that Finch-Zk provides a practical, deployment-ready safeguard for enhancing factual reliability in production LLM systems.

Paper Structure

This paper contains 19 sections, 3 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Overview of Finch-Zk
  • Figure 2: Motivating Example
  • Figure 3: Prompt used in Rephrase prompt variation
  • Figure 4: Prompt used in Expand-Before prompt variation
  • Figure 5: Prompt used in Expand-After prompt variation
  • ...and 6 more figures