Table of Contents
Fetching ...

Efficient Vector Search in the Wild: One Model for Multi-K Queries

Yifan Peng, Jiafei Fan, Xingda Wei, Sijie Shen, Rong Chen, Jianning Wang, Xiaojian Luo, Wenyuan Yu, Jingren Zhou, Haibo Chen

TL;DR

OMEGA is presented, a K-generalizable learned top-K search method that simultaneously achieves high accuracy, high performance, and low preprocessing cost for multi-K vector queries.

Abstract

Learned top-K search is a promising approach for serving vector queries with both high accuracy and performance. However, current models trained for a specific K value fail to generalize to real-world multi-K queries: they suffer from accuracy degradation (for larger Ks) and performance loss (for smaller Ks). Training the model to generalize on different Ks requires orders of magnitude more preprocessing time and is not suitable for serving vector queries in the wild. We present OMEGA, a K-generalizable learned top-K search method that simultaneously achieves high accuracy, high performance, and low preprocessing cost for multi-K vector queries. The key idea is that a base model properly trained on K=1 with our trajectory-based features can be used to accurately predict larger Ks with a dynamic refinement procedure and smaller Ks with minimal performance loss. To make our refinements efficient, we further leverage the statistical properties of top-K searches to reduce excessive model invocations. Extensive evaluations on multiple public and production datasets show that, under the same preprocessing budgets, OMEGA achieves 6-33% lower average latency compared to state-of-the-art learned search methods, while all systems achieve the same recall target. With only 16-30% of the preprocessing time, OMEGA attains 1.01-1.28x of the optimal average latency of these baselines.

Efficient Vector Search in the Wild: One Model for Multi-K Queries

TL;DR

OMEGA is presented, a K-generalizable learned top-K search method that simultaneously achieves high accuracy, high performance, and low preprocessing cost for multi-K vector queries.

Abstract

Learned top-K search is a promising approach for serving vector queries with both high accuracy and performance. However, current models trained for a specific K value fail to generalize to real-world multi-K queries: they suffer from accuracy degradation (for larger Ks) and performance loss (for smaller Ks). Training the model to generalize on different Ks requires orders of magnitude more preprocessing time and is not suitable for serving vector queries in the wild. We present OMEGA, a K-generalizable learned top-K search method that simultaneously achieves high accuracy, high performance, and low preprocessing cost for multi-K vector queries. The key idea is that a base model properly trained on K=1 with our trajectory-based features can be used to accurately predict larger Ks with a dynamic refinement procedure and smaller Ks with minimal performance loss. To make our refinements efficient, we further leverage the statistical properties of top-K searches to reduce excessive model invocations. Extensive evaluations on multiple public and production datasets show that, under the same preprocessing budgets, OMEGA achieves 6-33% lower average latency compared to state-of-the-art learned search methods, while all systems achieve the same recall target. With only 16-30% of the preprocessing time, OMEGA attains 1.01-1.28x of the optimal average latency of these baselines.
Paper Structure (23 sections, 1 equation, 19 figures, 1 table, 2 algorithms)

This paper contains 23 sections, 1 equation, 19 figures, 1 table, 2 algorithms.

Figures (19)

  • Figure 1: Serving vector queries in the wild.
  • Figure 2: (a) Multi-K query workloads sampled and (b) A breakdown of the different K access patterns on a sample of collections with the access frequencies normalized (norm.) to the total accesses of the queried collection.
  • Figure 3: A profile of the CPU computation power provisioned for serving and preprocessing vector collections in one cluster.
  • Figure 4: A characterization of the performance (search latency) vs. accuracy (recall) trade-off for three sampled queries from a production collection in Alibaba.
  • Figure 5: (a) An analysis showing that a model trained on one specific $K$ fails to generalize to other $K$s. (b) An illustration of why generalization from small to large $K$s has accuracy issues due to under-searching. (c) An illustration of why generalization from large to small $K$s has performance issues due to over-searching. (d) An overview of our $K$-generalizable learned search using only one model ($Search_k$) that is capable of searching for top-$k$ vectors on a search set (s_set). Note that for each row in (d) the search set (s_set) is dynamically evolving.
  • ...and 14 more figures