Table of Contents
Fetching ...

GraphER: An Efficient Graph-Based Enrichment and Reranking Method for Retrieval-Augmented Generation

Ruizhong Miao, Yuying Wang, Rongguang Wang, Chenyang Li, Tao Sheng, Sujith Ravi, Dan Roth

Abstract

Semantic search in retrieval-augmented generation (RAG) systems is often insufficient for complex information needs, particularly when relevant evidence is scattered across multiple sources. Prior approaches to this problem include agentic retrieval strategies, which expand the semantic search space by generating additional queries. However, these methods do not fully leverage the organizational structure of the data and instead rely on iterative exploration, which can lead to inefficient retrieval. Another class of approaches employs knowledge graphs to model non-semantic relationships through graph edges. Although effective in capturing richer proximities, such methods incur significant maintenance costs and are often incompatible with the vector stores used in most production systems. To address these limitations, we propose GraphER, a graph-based enrichment and reranking method that captures multiple forms of proximity beyond semantic similarity. GraphER independently enriches data objects during offline indexing and performs graph-based reranking over candidate objects at query time. This design does not require a knowledge graph, allowing GraphER to integrate seamlessly with standard vector stores. In addition, GraphER is retriever-agnostic and introduces negligible latency overhead. Experiments on multiple retrieval benchmarks demonstrate the effectiveness of the proposed approach.

GraphER: An Efficient Graph-Based Enrichment and Reranking Method for Retrieval-Augmented Generation

Abstract

Semantic search in retrieval-augmented generation (RAG) systems is often insufficient for complex information needs, particularly when relevant evidence is scattered across multiple sources. Prior approaches to this problem include agentic retrieval strategies, which expand the semantic search space by generating additional queries. However, these methods do not fully leverage the organizational structure of the data and instead rely on iterative exploration, which can lead to inefficient retrieval. Another class of approaches employs knowledge graphs to model non-semantic relationships through graph edges. Although effective in capturing richer proximities, such methods incur significant maintenance costs and are often incompatible with the vector stores used in most production systems. To address these limitations, we propose GraphER, a graph-based enrichment and reranking method that captures multiple forms of proximity beyond semantic similarity. GraphER independently enriches data objects during offline indexing and performs graph-based reranking over candidate objects at query time. This design does not require a knowledge graph, allowing GraphER to integrate seamlessly with standard vector stores. In addition, GraphER is retriever-agnostic and introduces negligible latency overhead. Experiments on multiple retrieval benchmarks demonstrate the effectiveness of the proposed approach.

Paper Structure

This paper contains 25 sections, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Example of RAG for SQL Generation: The baseline uses semantic search alone and yields search results in which the Stores and Orders tables are ranked highly due to their semantic proximity to the question, while the Customers table does not appear among the top-5 results. In contrast, GraphER reranks the candidate tables to produce a ranked list whose top-5 results include all relevant tables.
  • Figure 2: During offline indexing, each data item is enriched with metadata that informs edge relationships. The data object is then indexed using standard retrieval methods, such as vector embedding. In the online retrieval phase, a base retriever first retrieves a set of candidate data objects and assigns an initial score to each of them. Next, a graph is constructed from these candidate objects based on the enrichments that indicate edge existence. Finally, a graph-based ranking algorithm is applied to the constructed graph, outputting the final relevance scores for the candidate data objects.