Don't Forget to Connect! Improving RAG with Graph-based Reranking

Jialin Dong; Bahare Fatemi; Bryan Perozzi; Lin F. Yang; Anton Tsitsulin

Don't Forget to Connect! Improving RAG with Graph-based Reranking

Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F. Yang, Anton Tsitsulin

TL;DR

The paper tackles the limitation of Retrieval-Augmented Generation (RAG) in ODQA where cross-document connections are underutilized. It introduces G-RAG, a graph-based reranker that builds AMR-informed document graphs and applies a graph neural network to rank retrieved documents, using a pairwise ranking loss and a novel set of tie-aware metrics. Empirical results on Natural Questions and TriviaQA show that G-RAG, especially with RL-based training, outperforms baselines while requiring less computational overhead; zero-shot PaLM 2 as a reranker underperforms, underscoring the importance of dedicated reranking architectures. By integrating cross-document structure and semantic AMR information, G-RAG improves the grounding and relevance of documents fed to the reader, with practical implications for more efficient and accurate ODQA systems.

Abstract

Retrieval Augmented Generation (RAG) has greatly improved the performance of Large Language Model (LLM) responses by grounding generation with context from existing documents. These systems work well when documents are clearly relevant to a question context. But what about when a document has partial information, or less obvious connections to the context? And how should we reason about connections between documents? In this work, we seek to answer these two core questions about RAG generation. We introduce G-RAG, a reranker based on graph neural networks (GNNs) between the retriever and reader in RAG. Our method combines both connections between documents and semantic information (via Abstract Meaning Representation graphs) to provide a context-informed ranker for RAG. G-RAG outperforms state-of-the-art approaches while having smaller computational footprint. Additionally, we assess the performance of PaLM 2 as a reranker and find it to significantly underperform G-RAG. This result emphasizes the importance of reranking for RAG even when using Large Language Models.

Don't Forget to Connect! Improving RAG with Graph-based Reranking

TL;DR

Abstract

Paper Structure (26 sections, 13 equations, 5 figures, 5 tables)

This paper contains 26 sections, 13 equations, 5 figures, 5 tables.

Introduction
Related Work
RAG in ODQA.
Graphs in ODQA.
Abstract Meaning Representation (AMR).
LLMs in Reranking.
Proposed Method: G-RAG
Establishing Document Graphs via AMR
Graph Neural Networks for Reranking
Generating Node Features
Edge Features
Representation Update
Reranking Score and Training Loss
Experiments
Setting
...and 11 more sections

Figures (5)

Figure 1: G-RAG uses two graphs for re-ranking documents: The Abstract Meaning Representation (AMR) graph is used as features for the document-level graph. Document graph is then used for document reranking.
Figure 2: Number of nodes and edges in AMR graphs in train/dev/test set of dataset NQ and TQA.
Figure 3: Number of SSSPs AMR graphs in train set of dataset NQ and TQA.
Figure 4: The pipeline of G-RAG.
Figure 5: Examples of LLM-generate relevant score.

Don't Forget to Connect! Improving RAG with Graph-based Reranking

TL;DR

Abstract

Don't Forget to Connect! Improving RAG with Graph-based Reranking

Authors

TL;DR

Abstract

Table of Contents

Figures (5)