Table of Contents
Fetching ...

Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning

Yuehan Qin, Shawn Li, Yi Nian, Xinyan Velocity Yu, Yue Zhao, Xuezhe Ma

TL;DR

This work targets factual hallucinations in LLMs caused by false premises in user queries. It introduces a retrieval-augmented logical reasoning framework that converts queries into a structured logical form, verifies premises against a knowledge graph via retrieval-augmented generation, and informs the LLM with verification results to maintain factual consistency. Experiments on KG-FPQ demonstrate improved false-premise detection and reduced hallucinations without requiring model logits or extensive fine-tuning, with performance benefiting from logic-grounded retrieval, especially for multi-hop reasoning. The approach offers a practical pathway to integrate proactive factual checks into LLM pipelines, enhancing robustness and reliability in real-time applications.

Abstract

Large language models (LLMs) have shown substantial capacity for generating fluent, contextually appropriate responses. However, they can produce hallucinated outputs, especially when a user query includes one or more false premises-claims that contradict established facts. Such premises can mislead LLMs into offering fabricated or misleading details. Existing approaches include pretraining, fine-tuning, and inference-time techniques that often rely on access to logits or address hallucinations after they occur. These methods tend to be computationally expensive, require extensive training data, or lack proactive mechanisms to prevent hallucination before generation, limiting their efficiency in real-time applications. We propose a retrieval-based framework that identifies and addresses false premises before generation. Our method first transforms a user's query into a logical representation, then applies retrieval-augmented generation (RAG) to assess the validity of each premise using factual sources. Finally, we incorporate the verification results into the LLM's prompt to maintain factual consistency in the final output. Experiments show that this approach effectively reduces hallucinations, improves factual accuracy, and does not require access to model logits or large-scale fine-tuning.

Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning

TL;DR

This work targets factual hallucinations in LLMs caused by false premises in user queries. It introduces a retrieval-augmented logical reasoning framework that converts queries into a structured logical form, verifies premises against a knowledge graph via retrieval-augmented generation, and informs the LLM with verification results to maintain factual consistency. Experiments on KG-FPQ demonstrate improved false-premise detection and reduced hallucinations without requiring model logits or extensive fine-tuning, with performance benefiting from logic-grounded retrieval, especially for multi-hop reasoning. The approach offers a practical pathway to integrate proactive factual checks into LLM pipelines, enhancing robustness and reliability in real-time applications.

Abstract

Large language models (LLMs) have shown substantial capacity for generating fluent, contextually appropriate responses. However, they can produce hallucinated outputs, especially when a user query includes one or more false premises-claims that contradict established facts. Such premises can mislead LLMs into offering fabricated or misleading details. Existing approaches include pretraining, fine-tuning, and inference-time techniques that often rely on access to logits or address hallucinations after they occur. These methods tend to be computationally expensive, require extensive training data, or lack proactive mechanisms to prevent hallucination before generation, limiting their efficiency in real-time applications. We propose a retrieval-based framework that identifies and addresses false premises before generation. Our method first transforms a user's query into a logical representation, then applies retrieval-augmented generation (RAG) to assess the validity of each premise using factual sources. Finally, we incorporate the verification results into the LLM's prompt to maintain factual consistency in the final output. Experiments show that this approach effectively reduces hallucinations, improves factual accuracy, and does not require access to model logits or large-scale fine-tuning.

Paper Structure

This paper contains 20 sections, 4 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: LLM experiences factuality hallucination when faced with a false premise question, where both entities (The Lord of the Rings: The Return of the King, AACTA Award for Best Adapted Screenplay) exist but are not correctly aligned.
  • Figure 2: Overview of our approach. Left: The original query is converted into a logical form. Middle: The logical form is used to retrieve relevant elements from the knowledge graph and detect false premises. Right: Comparison of studied retrievers for aligning logical form with the knowledge graph. The LLM generates responses with reduced hallucination given prompts with premise verification.
  • Figure 3: Comparison of performance metrics across different retrieval methods using logical forms and/or original queries.
  • Figure 4: GPT-3.5-turbo: False premise detection accuracy across single-hop and multi-hop queries. Using logical form-based RAG mainly helps detect false premises in multi-hop questions.