Table of Contents
Fetching ...

ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation

Riccardo Pozzi, Matteo Palmonari, Andrea Coletta, Luigi Bellomarini, Jens Lehmann, Sahar Vahdati

TL;DR

This work tackles the challenge of knowledge gaps and hallucinations in large language models by introducing ReFactX, a retriever-free approach that enables LLMs to access very large knowledge bases through constrained generation guided by a pre-built, disk-backed prefix-tree. By restricting the decoding space to only tokens that complete a valid knowledge-base fact, ReFactX achieves grounding with minimal latency while scaling to about 800 million facts from Wikidata. The method is integrated into QA workflows via an in-context prompting scheme and a dedicated Fact command, enabling precise, fact-based reasoning without additional retrieval models. Empirical evaluation across multiple datasets shows competitive performance, with notable gains on multi-hop and generic questions and a modest generation-time overhead, highlighting ReFactX’s potential as a scalable grounding alternative for KB-intensive tasks.

Abstract

Knowledge gaps and hallucinations are persistent challenges for Large Language Models (LLMs), which generate unreliable responses when lacking the necessary information to fulfill user instructions. Existing approaches, such as Retrieval-Augmented Generation (RAG) and tool use, aim to address these issues by incorporating external knowledge. Yet, they rely on additional models or services, resulting in complex pipelines, potential error propagation, and often requiring the model to process a large number of tokens. In this paper, we present a scalable method that enables LLMs to access external knowledge without depending on retrievers or auxiliary models. Our approach uses constrained generation with a pre-built prefix-tree index. Triples from a Knowledge Graph are verbalized in textual facts, tokenized, and indexed in a prefix tree for efficient access. During inference, to acquire external knowledge, the LLM generates facts with constrained generation which allows only sequences of tokens that form an existing fact. We evaluate our proposal on Question Answering and show that it scales to large knowledge bases (800 million facts), adapts to domain-specific data, and achieves effective results. These gains come with minimal generation-time overhead. ReFactX code is available at https://github.com/rpo19/ReFactX.

ReFactX: Scalable Reasoning with Reliable Facts via Constrained Generation

TL;DR

This work tackles the challenge of knowledge gaps and hallucinations in large language models by introducing ReFactX, a retriever-free approach that enables LLMs to access very large knowledge bases through constrained generation guided by a pre-built, disk-backed prefix-tree. By restricting the decoding space to only tokens that complete a valid knowledge-base fact, ReFactX achieves grounding with minimal latency while scaling to about 800 million facts from Wikidata. The method is integrated into QA workflows via an in-context prompting scheme and a dedicated Fact command, enabling precise, fact-based reasoning without additional retrieval models. Empirical evaluation across multiple datasets shows competitive performance, with notable gains on multi-hop and generic questions and a modest generation-time overhead, highlighting ReFactX’s potential as a scalable grounding alternative for KB-intensive tasks.

Abstract

Knowledge gaps and hallucinations are persistent challenges for Large Language Models (LLMs), which generate unreliable responses when lacking the necessary information to fulfill user instructions. Existing approaches, such as Retrieval-Augmented Generation (RAG) and tool use, aim to address these issues by incorporating external knowledge. Yet, they rely on additional models or services, resulting in complex pipelines, potential error propagation, and often requiring the model to process a large number of tokens. In this paper, we present a scalable method that enables LLMs to access external knowledge without depending on retrievers or auxiliary models. Our approach uses constrained generation with a pre-built prefix-tree index. Triples from a Knowledge Graph are verbalized in textual facts, tokenized, and indexed in a prefix tree for efficient access. During inference, to acquire external knowledge, the LLM generates facts with constrained generation which allows only sequences of tokens that form an existing fact. We evaluate our proposal on Question Answering and show that it scales to large knowledge bases (800 million facts), adapts to domain-specific data, and achieves effective results. These gains come with minimal generation-time overhead. ReFactX code is available at https://github.com/rpo19/ReFactX.

Paper Structure

This paper contains 14 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: ReFactX Answering an Open-domain Question. The LLM sketches a plan, makes two Fact calls, and---with constrained generation (blue underline)---inserts valid facts from a Wikidata-based prefix tree before giving the final answer.
  • Figure 2: Constrained decoding steers the LLM toward the correct fact. At each step constrained decoder chooses the highest-probability token that still completes a Wikidata fact, avoiding invalid branches and yielding "<Danny Boyle> <date of birth> <1956-10-20> .".
  • Figure 3: Generating facts from the fact tree. Token ids are surrounded by rounded rectangles. The arrows $S \xrightarrow{n} t_{k+1}$ represent the selection of the next token $t_k$ from the current sequence $S=t_0..t_k$ and $n$ is the number of leaves reachable from $S$.
  • Figure 4: System prompt for the Wikidata KB. The prompt instructs the LLM to reason step-by-step, issue Fact calls to access facts from the Wikidata fact tree, and deliver an answer only after evidence is gathered—otherwise respond "I don’t know."
  • Figure 5: Question and answer type distribution across the four evaluation datasets. Stacked bars show how each benchmark (Bank, Mintaka, 2Wiki, WebQSP) varies in its mix of question categories---generic, comparative, multi-hop, and others---and answer forms (generic, yes/no, enumeration).
  • ...and 1 more figures