TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

Shige Liu; Zhifang Zeng; Li Chen; Adil Ainihaer; Arun Ramasami; Songting Chen; Yu Xu; Mingxi Wu; Jianguo Wang

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

Shige Liu, Zhifang Zeng, Li Chen, Adil Ainihaer, Arun Ramasami, Songting Chen, Yu Xu, Mingxi Wu, Jianguo Wang

TL;DR

TigerVector addresses the need to ground LLMs with both unstructured and structured data by unifying vector search and graph query within a native graph database. It introduces an embedding data type, decoupled vector storage, and MVCC-based incremental updates, all integrated through GSQL to enable declarative and procedural vector searches, graph-pattern queries, and their combination. Through extensive experiments, TigerVector demonstrates superior vector-search performance and scalability relative to Neo4j, Amazon Neptune, and competitive with Milvus, while delivering significant cost advantages and robust hybrid search capabilities. The work offers a practical path toward unified retrieval for RAG systems and suggests design principles applicable to other graph databases seeking efficient vector search support.

Abstract

In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

TL;DR

Abstract

Paper Structure (26 sections, 11 figures, 4 tables)

This paper contains 26 sections, 11 figures, 4 tables.

Introduction
Background and Related Work
TigerGraph
Vector Databases
Supporting Vector Search in Graph Databases
System Design Overview
Vector Index Design
Embedding Type
Decoupled Storage for Vectors
Incremental Update
Vector Index Choice
Vector Search Design
Vector Search
Filtered Vector Search
Vector Search on Graph Patterns
...and 11 more sections

Figures (11)

Figure 1: System Overview
Figure 2: Example of Embedding Space
Figure 3: Decoupled Storage. Vectors within a vertex segment (left) are stored separately in another embedding segment (right), while keeping the same ids.
Figure 4: Incremental Vector Vacuum Processes. The delta merge process (right) flushes delta records into a new delta file. The index merge process (left) updates the index snapshot with a sequence of delta files.
Figure 5: Distributed Query Processing. The coordinator prepares top-k vector search requests in the send queue and dispatches requests to worker servers. Each worker conducts top-k search locally and sends IDs and distances as results back to the response pool in the coordinator.
...and 6 more figures

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

TL;DR

Abstract

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

Authors

TL;DR

Abstract

Table of Contents

Figures (11)