Table of Contents
Fetching ...

Incremental IVF Index Maintenance for Streaming Vector Search

Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Umar Farooq Minhas, Jeffery Pound, Cedric Renggli, Nima Reyhani, Ihab F. Ilyas, Theodoros Rekatsinas, Shivaram Venkataraman

TL;DR

Ada-IVF, an incremental indexing methodology for Inverted File (IVF) indexes, is introduced, which achieves an average of 2x and up to 5x higher update throughput across a range of benchmark workloads.

Abstract

The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existing vector search indexes degrade in search quality and performance as the underlying data is updated unless costly index reconstruction is performed. To address this, we introduce Ada-IVF, an incremental indexing methodology for Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive maintenance policy that decides which index partitions are problematic for performance and should be repartitioned and 2) a local re-clustering mechanism that determines how to repartition them. Compared with state-of-the-art dynamic IVF index maintenance strategies, Ada-IVF achieves an average of 2x and up to 5x higher update throughput across a range of benchmark workloads.

Incremental IVF Index Maintenance for Streaming Vector Search

TL;DR

Ada-IVF, an incremental indexing methodology for Inverted File (IVF) indexes, is introduced, which achieves an average of 2x and up to 5x higher update throughput across a range of benchmark workloads.

Abstract

The prevalence of vector similarity search in modern machine learning applications and the continuously changing nature of data processed by these applications necessitate efficient and effective index maintenance techniques for vector search indexes. Designed primarily for static workloads, existing vector search indexes degrade in search quality and performance as the underlying data is updated unless costly index reconstruction is performed. To address this, we introduce Ada-IVF, an incremental indexing methodology for Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive maintenance policy that decides which index partitions are problematic for performance and should be repartitioned and 2) a local re-clustering mechanism that determines how to repartition them. Compared with state-of-the-art dynamic IVF index maintenance strategies, Ada-IVF achieves an average of 2x and up to 5x higher update throughput across a range of benchmark workloads.

Paper Structure

This paper contains 46 sections, 11 equations, 14 figures, 3 tables, 4 algorithms.

Figures (14)

  • Figure 1: Static IVF indexes trained with balanced k-means on SIFT1M. Each point is a different initialization or number of iterations for k-means. As error increases, QPS degrades for recall@0.9
  • Figure 2: Evaluation of a Frozen IVF index on a SIFT1M dynamic workload with insert/delete ratio = 1. As partition imbalance increases, read throughput degrades for recall@0.9
  • Figure 3: Evaluation of an IVF index reconstruction error as vectors from SIFT1M are inserted into the index. The error grows away from the error observed from fully rebuilding the index.
  • Figure 4: Read and write access patterns of partitions in an internal entity search workload. Partition IDs are ordered by their read count. The read distribution is skewed and uncorrelated with the write distribution, showing that many partitions are modified but not read from.
  • Figure 5: Evaluation of LIRE and Ada-IVF for a dynamic MSTuring10m workload where queries are localized to a few partitions. LIRE and our locality-aware approach Ada-IVF achieve similar QPS, but LIRE creates $3\times$ as many partitions and requires $4\times$ the update time due to its maintenance of partitions that are not accessed by the queries.
  • ...and 9 more figures