Table of Contents
Fetching ...

GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases

Alfred Clemedtson, Borun Shi

TL;DR

GraphRAFT tackles hallucination and inefficiency in knowledge-graph QA by merging retrieval-augmented LLMs with native graph database querying. It learns to synthesize executable Cypher queries that fetch compact subgraphs and employs a second LLM to reason over the subgraph to yield accurate answers, all within a constrained decoding framework that guarantees syntactic and semantic correctness. The approach is modular and readily applicable to any KG stored in a graph DB, achieving state-of-the-art results on STaRK-prime and STaRK-mag with sample-efficient training. This work enables scalable, faithful KBQA over large, text-rich KGs using off-the-shelf components and graph-query engines, with demonstrated practical benefits for private domain graphs.

Abstract

Large language models have shown remarkable language processing and reasoning ability but are prone to hallucinate when asked about private data. Retrieval-augmented generation (RAG) retrieves relevant data that fit into an LLM's context window and prompts the LLM for an answer. GraphRAG extends this approach to structured Knowledge Graphs (KGs) and questions regarding entities multiple hops away. The majority of recent GraphRAG methods either overlook the retrieval step or have ad hoc retrieval processes that are abstract or inefficient. This prevents them from being adopted when the KGs are stored in graph databases supporting graph query languages. In this work, we present GraphRAFT, a retrieve-and-reason framework that finetunes LLMs to generate provably correct Cypher queries to retrieve high-quality subgraph contexts and produce accurate answers. Our method is the first such solution that can be taken off-the-shelf and used on KGs stored in native graph DBs. Benchmarks suggest that our method is sample-efficient and scales with the availability of training data. Our method achieves significantly better results than all state-of-the-art models across all four standard metrics on two challenging Q&As on large text-attributed KGs.

GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases

TL;DR

GraphRAFT tackles hallucination and inefficiency in knowledge-graph QA by merging retrieval-augmented LLMs with native graph database querying. It learns to synthesize executable Cypher queries that fetch compact subgraphs and employs a second LLM to reason over the subgraph to yield accurate answers, all within a constrained decoding framework that guarantees syntactic and semantic correctness. The approach is modular and readily applicable to any KG stored in a graph DB, achieving state-of-the-art results on STaRK-prime and STaRK-mag with sample-efficient training. This work enables scalable, faithful KBQA over large, text-rich KGs using off-the-shelf components and graph-query engines, with demonstrated practical benefits for private domain graphs.

Abstract

Large language models have shown remarkable language processing and reasoning ability but are prone to hallucinate when asked about private data. Retrieval-augmented generation (RAG) retrieves relevant data that fit into an LLM's context window and prompts the LLM for an answer. GraphRAG extends this approach to structured Knowledge Graphs (KGs) and questions regarding entities multiple hops away. The majority of recent GraphRAG methods either overlook the retrieval step or have ad hoc retrieval processes that are abstract or inefficient. This prevents them from being adopted when the KGs are stored in graph databases supporting graph query languages. In this work, we present GraphRAFT, a retrieve-and-reason framework that finetunes LLMs to generate provably correct Cypher queries to retrieve high-quality subgraph contexts and produce accurate answers. Our method is the first such solution that can be taken off-the-shelf and used on KGs stored in native graph DBs. Benchmarks suggest that our method is sample-efficient and scales with the availability of training data. Our method achieves significantly better results than all state-of-the-art models across all four standard metrics on two challenging Q&As on large text-attributed KGs.

Paper Structure

This paper contains 28 sections, 4 theorems, 2 equations, 5 figures, 5 tables.

Key Result

Lemma 1

(Informal) If a query is invalid, it will not be generated. When using beam width = 1, constrained decoding acts as greedy decoding among valid queries. When beam width = M, exactly all valid queries are generated.

Figures (5)

  • Figure 1: An example Cypher query. It takes as user input a list variable source_names and another list target_ids. It iterates through them and find all two-hop neighbours of each source_name node, requiring the second-hop node to be distinct to the source. The query returns aggregate information of the subgraph such as labels and types of nodes and edges, and arithmetic over how many second-hop nodes have ids that are in the user-defined node id list.
  • Figure 2: An example of creating ground-truth Cypher for a QA. In Step 1, few-shot LLM produces candidate entities which we ground with $\mathcal{G}$ in the DB with vector index. Step 2 shows part of the subgraph around the entity and answer nodes. With the DB, we execute the all one-hop, two-hop around each entity, and all length-two paths connecting the two entities in Step 3. We aggregate the hits and number of nodes for each query and rank them.
  • Figure 3: An example of grounded constrained decoding. For the given question, we tokenize all possible queries around it's identified entities. At each step during generation, our logits processor masks out invalid tokens. For example, after "ichthyosis", the LLM would have generated the symbols ") which has the highest logit. Our processor masks it out since this predicate name: "X-linked ichthyosis" is invalid.
  • Figure 4: An example prompt that describes a local subgraph retrieved by Cypher queries around identified entities. This prompt contains both textual information and patterns used to retrieved them, which encodes topology information.
  • Figure 5: The impact of entity resolution on the quality of Cypher queries.

Theorems & Definitions (7)

  • Lemma 1
  • Lemma 1.1
  • proof
  • Lemma 1.2
  • proof
  • Lemma 1.3
  • proof