Table of Contents
Fetching ...

100% Elimination of Hallucinations on RAGTruth for GPT-4 and GPT-3.5 Turbo

Michael C. Wood, Adam A. Forbes

TL;DR

Hallucinations severely limit the deployment of LLMs in enterprise contexts. Acurai introduces a preventive pipeline that reformulates queries and context to avoid noun-phrase collisions, producing faithful outputs via Fully-Formatted Facts and placeholder remapping. On the RAGTruth corpus, Acurai achieves 100% hallucination-free results for GPT-4 and GPT-3.5 Turbo, with a 95% Wilson score interval of $[0.91, 1.0]$. This work suggests that input design and pre-processing can fundamentally improve trustworthiness in RAG-enabled AI systems without relying on post hoc filtering.

Abstract

The issue of hallucinations in large language models (LLMs) remains a critical barrier to the adoption of AI in enterprise and other high-stakes applications. Despite advancements in retrieval-augmented generation (RAG) systems, current state-of-the-art methods fail to achieve more than 80% accuracy in generating faithful and factually correct outputs, even when provided with relevant and accurate context. In this work, we introduce Acurai, a novel systematic approach that achieves 100% hallucination-free responses in LLMs by reformatting queries and context data prior to input. Leveraging a deep understanding of LLM internal representations, the importance of noun-phrase dominance, and the role of discrete functional units (DFUs), Acurai ensures alignment between input context and generated output. We validate this method using the RAGTruth corpus, demonstrating its ability to eliminate 100% hallucinations for both GPT-4 and GPT-3.5 Turbo. Acurai sets a new standard for achieving consistent, accurate, and faithful AI responses, marking a significant step forward in the development of trustworthy AI systems.

100% Elimination of Hallucinations on RAGTruth for GPT-4 and GPT-3.5 Turbo

TL;DR

Hallucinations severely limit the deployment of LLMs in enterprise contexts. Acurai introduces a preventive pipeline that reformulates queries and context to avoid noun-phrase collisions, producing faithful outputs via Fully-Formatted Facts and placeholder remapping. On the RAGTruth corpus, Acurai achieves 100% hallucination-free results for GPT-4 and GPT-3.5 Turbo, with a 95% Wilson score interval of . This work suggests that input design and pre-processing can fundamentally improve trustworthiness in RAG-enabled AI systems without relying on post hoc filtering.

Abstract

The issue of hallucinations in large language models (LLMs) remains a critical barrier to the adoption of AI in enterprise and other high-stakes applications. Despite advancements in retrieval-augmented generation (RAG) systems, current state-of-the-art methods fail to achieve more than 80% accuracy in generating faithful and factually correct outputs, even when provided with relevant and accurate context. In this work, we introduce Acurai, a novel systematic approach that achieves 100% hallucination-free responses in LLMs by reformatting queries and context data prior to input. Leveraging a deep understanding of LLM internal representations, the importance of noun-phrase dominance, and the role of discrete functional units (DFUs), Acurai ensures alignment between input context and generated output. We validate this method using the RAGTruth corpus, demonstrating its ability to eliminate 100% hallucinations for both GPT-4 and GPT-3.5 Turbo. Acurai sets a new standard for achieving consistent, accurate, and faithful AI responses, marking a significant step forward in the development of trustworthy AI systems.

Paper Structure

This paper contains 22 sections, 2 equations, 2 figures.

Figures (2)

  • Figure 1: Long Context RAG Performance of LLMs databricks2024. Note that best case, fully 1 in 5 answers are incorrect.
  • Figure 2: Acurai RAGTruth Results by Model