Table of Contents
Fetching ...

VectorMaton: Efficient Vector Search with Pattern Constraints via an Enhanced Suffix Automaton

Haoxuan Xie, Siqiang Luo

TL;DR

VectorMaton is proposed, an automaton-based index that integrates pattern filtering with efficient vector search, while maintaining an index size comparable to the dataset size, and demonstrates that VectorMaton consistently outperforms all baselines.

Abstract

Approximate nearest neighbor search (ANNS) has become a cornerstone in modern vector database systems. Given a query vector, ANNS retrieves the closest vectors from a set of base vectors. In real-world applications, vectors are often accompanied by additional information, such as sequences or structured attributes, motivating the need for fine-grained vector search with constraints on this auxiliary data. Existing methods support attribute-based filtering or range-based filtering on categorical and numerical attributes, but they do not support pattern predicates over sequence attributes. In relational databases, predicates such as LIKE and CONTAINS are fundamental operators for filtering records based on substring patterns. As vector databases increasingly adopt SQL-style query interfaces, enabling pattern predicates over sequence attributes (e.g., texts and biological sequences) alongside vector similarity search becomes essential. In this paper, we formulate a novel problem: given a set of vectors each associated with a sequence, retrieve the nearest vectors whose sequences contain a given query pattern. To address this challenge, we propose VectorMaton, an automaton-based index that integrates pattern filtering with efficient vector search, while maintaining an index size comparable to the dataset size. Extensive experiments on real-world datasets demonstrate that VectorMaton consistently outperforms all baselines, achieving up to 10x higher query throughput at the same accuracy and up to 18x reduction in index size.

VectorMaton: Efficient Vector Search with Pattern Constraints via an Enhanced Suffix Automaton

TL;DR

VectorMaton is proposed, an automaton-based index that integrates pattern filtering with efficient vector search, while maintaining an index size comparable to the dataset size, and demonstrates that VectorMaton consistently outperforms all baselines.

Abstract

Approximate nearest neighbor search (ANNS) has become a cornerstone in modern vector database systems. Given a query vector, ANNS retrieves the closest vectors from a set of base vectors. In real-world applications, vectors are often accompanied by additional information, such as sequences or structured attributes, motivating the need for fine-grained vector search with constraints on this auxiliary data. Existing methods support attribute-based filtering or range-based filtering on categorical and numerical attributes, but they do not support pattern predicates over sequence attributes. In relational databases, predicates such as LIKE and CONTAINS are fundamental operators for filtering records based on substring patterns. As vector databases increasingly adopt SQL-style query interfaces, enabling pattern predicates over sequence attributes (e.g., texts and biological sequences) alongside vector similarity search becomes essential. In this paper, we formulate a novel problem: given a set of vectors each associated with a sequence, retrieve the nearest vectors whose sequences contain a given query pattern. To address this challenge, we propose VectorMaton, an automaton-based index that integrates pattern filtering with efficient vector search, while maintaining an index size comparable to the dataset size. Extensive experiments on real-world datasets demonstrate that VectorMaton consistently outperforms all baselines, achieving up to 10x higher query throughput at the same accuracy and up to 18x reduction in index size.
Paper Structure (27 sections, 12 theorems, 12 figures, 3 tables, 3 algorithms)

This paper contains 27 sections, 12 theorems, 12 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

Algorithm alg:pc-hnsw requires $O(m^2)$ index space.

Figures (12)

  • Figure 1: An example of pattern-constrained ANNS in biological database.
  • Figure 2: Summary of challenges and methods.
  • Figure 3: The SAM structure of the sequence "banana".
  • Figure 4: An example of incrementally constructing SAM.
  • Figure 5: Comparison of different baselines.
  • ...and 7 more figures

Theorems & Definitions (27)

  • Definition 1: ANNS with pattern constraints
  • Example 1
  • Example 2
  • Example 3
  • Theorem 1
  • Example 4
  • Definition 2: Position list (poslist)
  • Example 5
  • Definition 3: Equivalence class
  • Definition 4: Maximal pattern
  • ...and 17 more