Table of Contents
Fetching ...

Compass: General Filtered Search across Vector and Structured Data

Chunxiao Ye, Xiao Yan, Eric Lo

TL;DR

Compass addresses the challenge of general filtered search over hybrid vector and structured data by integrating a vector proximity graph with clustered B+-trees for relational attributes, all driven by a shared candidate queue. Its core innovation is a progressive, predicate-aware search that adaptively expands the graph search or consults relational indices to maintain connectivity and predicate satisfaction, without building new specialized indices. Empirical results show Compass outperforming NaviX across conjunctions and disjunctions, while approaching the efficiency of specialized single-attribute indices in single-attribute scenarios, and maintaining DBMS compatibility with modest storage overhead. The approach offers a practical, flexible, and robust solution for truly general filtered search in vector DBMSs, enabling scalable multi-attribute filtering with high recall at competitive throughput.

Abstract

The increasing prevalence of hybrid vector and relational data necessitates efficient, general support for queries that combine high-dimensional vector search with complex relational filtering. However, existing filtered search solutions are fundamentally limited by specialized indices, which restrict arbitrary filtering and hinder integration with general-purpose DBMSs. This work introduces \textsc{Compass}, a unified framework that enables general filtered search across vector and structured data without relying on new index designs. Compass leverages established index structures -- such as HNSW and IVF for vector attributes, and B+-trees for relational attributes -- implementing a principled cooperative query execution strategy that coordinates candidate generation and predicate evaluation across modalities. Uniquely, Compass maintains generality by allowing arbitrary conjunctions, disjunctions, and range predicates, while ensuring robustness even with highly-selective or multi-attribute filters. Comprehensive empirical evaluations demonstrate that Compass consistently outperforms NaviX, the only existing performant general framework, across diverse hybrid query workloads. It also matches the query throughput of specialized single-attribute indices in their favorite settings with only a single attribute involved, all while maintaining full generality and DBMS compatibility. Overall, Compass offers a practical and robust solution for achieving truly general filtered search in vector database systems.

Compass: General Filtered Search across Vector and Structured Data

TL;DR

Compass addresses the challenge of general filtered search over hybrid vector and structured data by integrating a vector proximity graph with clustered B+-trees for relational attributes, all driven by a shared candidate queue. Its core innovation is a progressive, predicate-aware search that adaptively expands the graph search or consults relational indices to maintain connectivity and predicate satisfaction, without building new specialized indices. Empirical results show Compass outperforming NaviX across conjunctions and disjunctions, while approaching the efficiency of specialized single-attribute indices in single-attribute scenarios, and maintaining DBMS compatibility with modest storage overhead. The approach offers a practical, flexible, and robust solution for truly general filtered search in vector DBMSs, enabling scalable multi-attribute filtering with high recall at competitive throughput.

Abstract

The increasing prevalence of hybrid vector and relational data necessitates efficient, general support for queries that combine high-dimensional vector search with complex relational filtering. However, existing filtered search solutions are fundamentally limited by specialized indices, which restrict arbitrary filtering and hinder integration with general-purpose DBMSs. This work introduces \textsc{Compass}, a unified framework that enables general filtered search across vector and structured data without relying on new index designs. Compass leverages established index structures -- such as HNSW and IVF for vector attributes, and B+-trees for relational attributes -- implementing a principled cooperative query execution strategy that coordinates candidate generation and predicate evaluation across modalities. Uniquely, Compass maintains generality by allowing arbitrary conjunctions, disjunctions, and range predicates, while ensuring robustness even with highly-selective or multi-attribute filters. Comprehensive empirical evaluations demonstrate that Compass consistently outperforms NaviX, the only existing performant general framework, across diverse hybrid query workloads. It also matches the query throughput of specialized single-attribute indices in their favorite settings with only a single attribute involved, all while maintaining full generality and DBMS compatibility. Overall, Compass offers a practical and robust solution for achieving truly general filtered search in vector database systems.

Paper Structure

This paper contains 24 sections, 3 equations, 11 figures, 5 tables, 4 algorithms.

Figures (11)

  • Figure 1: An illustration of proximity graph (left) and IVF index (right).
  • Figure 2: An illustration of Compass index, only the bottom layer graph is shown for HNSW and each color denotes a cluster in the IVF index.
  • Figure 3: An illustration of CompassSearch with single attribute $A$. Gray nodes pass the predicate while black ones do not. Green node marks the top-1 result. Orange dot marks the query vector while orange star marks the graph entry point. Cyan, blue and yellow represent different clusters. For conciseness, B-tree is hidden in the figure.
  • Figure 4: Conjunction Range Filtering. 0.9 Recall.
  • Figure 5: Conjunction Range Filtering. 0.85/0.95 Recall. On VIDEO and GIST.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Example 1
  • Example 2