Table of Contents
Fetching ...

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

Shige Liu, Zhifang Zeng, Li Chen, Adil Ainihaer, Arun Ramasami, Songting Chen, Yu Xu, Mingxi Wu, Jianguo Wang

TL;DR

TigerVector addresses the need to ground LLMs with both unstructured and structured data by unifying vector search and graph query within a native graph database. It introduces an embedding data type, decoupled vector storage, and MVCC-based incremental updates, all integrated through GSQL to enable declarative and procedural vector searches, graph-pattern queries, and their combination. Through extensive experiments, TigerVector demonstrates superior vector-search performance and scalability relative to Neo4j, Amazon Neptune, and competitive with Milvus, while delivering significant cost advantages and robust hybrid search capabilities. The work offers a practical path toward unified retrieval for RAG systems and suggests design principles applicable to other graph databases seeking efficient vector search support.

Abstract

In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.

TigerVector: Supporting Vector Search in Graph Databases for Advanced RAGs

TL;DR

TigerVector addresses the need to ground LLMs with both unstructured and structured data by unifying vector search and graph query within a native graph database. It introduces an embedding data type, decoupled vector storage, and MVCC-based incremental updates, all integrated through GSQL to enable declarative and procedural vector searches, graph-pattern queries, and their combination. Through extensive experiments, TigerVector demonstrates superior vector-search performance and scalability relative to Neo4j, Amazon Neptune, and competitive with Milvus, while delivering significant cost advantages and robust hybrid search capabilities. The work offers a practical path toward unified retrieval for RAG systems and suggests design principles applicable to other graph databases seeking efficient vector search support.

Abstract

In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.
Paper Structure (26 sections, 11 figures, 4 tables)

This paper contains 26 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: System Overview
  • Figure 2: Example of Embedding Space
  • Figure 3: Decoupled Storage. Vectors within a vertex segment (left) are stored separately in another embedding segment (right), while keeping the same ids.
  • Figure 4: Incremental Vector Vacuum Processes. The delta merge process (right) flushes delta records into a new delta file. The index merge process (left) updates the index snapshot with a sequence of delta files.
  • Figure 5: Distributed Query Processing. The coordinator prepares top-k vector search requests in the send queue and dispatches requests to worker servers. Each worker conducts top-k search locally and sends IDs and distances as results back to the response pool in the coordinator.
  • ...and 6 more figures