The Hybrid Multimodal Graph Index (HMGI): A Comprehensive Framework for Integrated Relational and Vector Search
Joydeep Chandra, Satyam Kumar Navneet, Yong Zhang
TL;DR
The paper addresses the lack of a unified system that can perform both semantic vector similarity and relational graph traversal on multimodal data. It proposes the Hybrid Multimodal Graph Index (HMGI), a framework that merges vector search into native graph databases (e.g., Neo4j) to support integrated hybrid queries with sub-linear times, exemplified by $O(\log N)$ vector search efficiency. Its main contributions are Integrated Hybrid Query Processing, Modality-Aware Indexing, and Adaptive Updates with MVCC delta stores and learned optimization, demonstrated to deliver higher recall and lower latency than decoupled or pure-vector systems. This work has significant practical impact for AI applications, retrieval-augmented generation, and multimodal analytics by enabling scalable, expressive queries over interconnected multimodal data.
Abstract
The proliferation of complex, multimodal datasets has exposed a critical gap between the capabilities of specialized vector databases and traditional graph databases. While vector databases excel at semantic similarity search, they lack the capacity for deep relational querying. Conversely, graph databases master complex traversals but are not natively optimized for high-dimensional vector search. This paper introduces the Hybrid Multimodal Graph Index (HMGI), a novel framework designed to bridge this gap by creating a unified system for efficient, hybrid queries on multimodal data. HMGI leverages the native graph database architecture and integrated vector search capabilities, exemplified by platforms like Neo4j, to combine Approximate Nearest Neighbor Search (ANNS) with expressive graph traversal queries. Key innovations of the HMGI framework include modality-aware partitioning of embeddings to optimize index structure and query performance, and a system for adaptive, low-overhead index updates to support dynamic data ingestion, drawing inspiration from the architectural principles of systems like TigerVector. By integrating semantic similarity search directly with relational context, HMGI aims to outperform pure vector databases like Milvus in complex, relationship-heavy query scenarios and achieve sub-linear query times for hybrid tasks.
