The novel vector database
Tom. Lou
TL;DR
The paper addresses the inefficiency of updating on-disk graph-based ANNS indices under coupled storage by proposing a decoupled topology-vector architecture. This approach separates graph topology from vector data to minimize redundant I/O during updates, achieving large speedups for insertions and deletions and improving query efficiency. The work analyzes the update-query trade-off, demonstrates notable performance gains over state-of-the-art disk-based systems, and positions the decoupled design as a practical solution for billion-scale dynamic ANNS tasks.
Abstract
On-disk graph-based indexes are widely used in approximate nearest neighbor (ANN) search systems for large-scale, high-dimensional vectors. However, traditional coupled storage methods, which store vectors within the index, are inefficient for index updates. Coupled storage incurs excessive redundant vector reads and writes when updating the graph topology, leading to significant invalid I/O. To address this issue, we propose a decoupled storage architecture. Experimental results show that the decoupled architecture improves update speed by 10.05x for insertions and 6.89x for deletions, while the three-stage query and incremental reordering enhance query efficiency by 2.66x compared to the traditional coupled architecture.
