Table of Contents
Fetching ...

On Storage Neural Network Augmented Approximate Nearest Neighbor Search

Taiga Ikeda, Daisuke Miyashita, Jun Deguchi

TL;DR

This work tackles large-scale ANN when key vectors are stored in storage rather than memory, where storage I/O latency dominates. It introduces a neural-network-based predictor to select clusters likely to contain the ground-truth nearest neighbor and pairs this with a cluster-duplication strategy to improve recall without increasing in-memory footprint. Empirical results on SIFT1M and CLIP show the approach reduces the number of vectors read from storage by substantial margins (e.g., ~58%–80% fewer reads at high recall) compared with exhaustive or SPANN baselines, indicating a meaningful latency improvement in storage-centric pipelines. The method is designed to be GPU-friendly and compatible with existing partitioning-based approaches, offering a practical path to scalable on-storage ANN for very large datasets.

Abstract

Large-scale approximate nearest neighbor search (ANN) has been gaining attention along with the latest machine learning researches employing ANNs. If the data is too large to fit in memory, it is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory. The storage device such as NAND flash memory has larger capacity than the memory device such as DRAM, but they also have larger latency to read data. Therefore, ANN methods for storage require completely different approaches from conventional in-memory ANN methods. Since the approximation that the time required for search is determined only by the amount of data fetched from storage holds under reasonable assumptions, our goal is to minimize it while maximizing recall. For partitioning-based ANNs, vectors are partitioned into clusters in the index building phase. In the search phase, some of the clusters are chosen, the vectors in the chosen clusters are fetched from storage, and the nearest vector is retrieved from the fetched vectors. Thus, the key point is to accurately select the clusters containing the ground truth nearest neighbor vectors. We accomplish this by proposing a method to predict the correct clusters by means of a neural network that is gradually refined by alternating supervised learning and duplicated cluster assignment. Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.

On Storage Neural Network Augmented Approximate Nearest Neighbor Search

TL;DR

This work tackles large-scale ANN when key vectors are stored in storage rather than memory, where storage I/O latency dominates. It introduces a neural-network-based predictor to select clusters likely to contain the ground-truth nearest neighbor and pairs this with a cluster-duplication strategy to improve recall without increasing in-memory footprint. Empirical results on SIFT1M and CLIP show the approach reduces the number of vectors read from storage by substantial margins (e.g., ~58%–80% fewer reads at high recall) compared with exhaustive or SPANN baselines, indicating a meaningful latency improvement in storage-centric pipelines. The method is designed to be GPU-friendly and compatible with existing partitioning-based approaches, offering a practical path to scalable on-storage ANN for very large datasets.

Abstract

Large-scale approximate nearest neighbor search (ANN) has been gaining attention along with the latest machine learning researches employing ANNs. If the data is too large to fit in memory, it is necessary to search for the most similar vectors to a given query vector from the data stored in storage devices, not from that in memory. The storage device such as NAND flash memory has larger capacity than the memory device such as DRAM, but they also have larger latency to read data. Therefore, ANN methods for storage require completely different approaches from conventional in-memory ANN methods. Since the approximation that the time required for search is determined only by the amount of data fetched from storage holds under reasonable assumptions, our goal is to minimize it while maximizing recall. For partitioning-based ANNs, vectors are partitioned into clusters in the index building phase. In the search phase, some of the clusters are chosen, the vectors in the chosen clusters are fetched from storage, and the nearest vector is retrieved from the fetched vectors. Thus, the key point is to accurately select the clusters containing the ground truth nearest neighbor vectors. We accomplish this by proposing a method to predict the correct clusters by means of a neural network that is gradually refined by alternating supervised learning and duplicated cluster assignment. Compared to state-of-the-art SPANN and an exhaustive method using k-means clustering and linear search, the proposed method achieves 90% recall on SIFT1M with 80% and 58% less data fetched from storage, respectively.

Paper Structure

This paper contains 15 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: (a) Recall@1 vs the number of vectors fetched from storage. It greatly depends on the number of clusters. (b) VQ under recall@1=90% vs the number of clusters. The line and error bar shows the average and standard deviation of 10 measurements. In these experiments, we use one million SIFT1M base data as key vectors and ten thousand SIFT1B query data as a query vectors.
  • Figure 2: Visualization with 2-dimensional toy data. (a) Key vectors are partitioned into four clusters. The cluster assignment is expressed by color. (b) Query vectors colored by the chosen cluster in the search phase by the conventional method. The query vectors are shown in light-colored circle. (c) Query vectors colored by the correct cluster that contains the nearest key vector to each query vector. The query vectors are shown in light-colored circle and the key vectors are shown in dark-colored rectangle. (d) Wrong choices are shown in gray.
  • Figure 3: Effect of our proposed method. The upper figures show the query vectors colored by the predicted cluster by the neural network. The bottom figures show the wrong cluster choices in gray. From left to right, border lines that the neural network predicts are fitting to the ground truth as the training progresses, and the number of wrong cluster choices decreases.
  • Figure 4: Recall@1 vs the number of vectors fetched from storage.
  • Figure :
  • ...and 1 more figures