SQUASH: Serverless and Distributed Quantization-based Attributed Vector Similarity Search

Joe Oakley; Hakan Ferhatosmanoglu

SQUASH: Serverless and Distributed Quantization-based Attributed Vector Similarity Search

Joe Oakley, Hakan Ferhatosmanoglu

TL;DR

SQUASH tackles scalable, high-dimensional vector search with rich attribute filtering in a serverless setting. It introduces Optimized Scalar Quantization (OSQ), a segment-based non-uniform quantization that compresses vectors and attributes for fast, parallel filtering and distance estimation, avoiding heavy re-ranking. The system builds a multi-stage, distributed pipeline with coarse partitioning, low-bit pruning, and LB-based distance lookups, augmented by Data Retention Exploitation (DRE) and a tree-based FaaS invocation scheme to scale to thousands of concurrent functions while reducing I/O. Empirical results on multiple benchmarks show substantial improvements in throughput (up to 18x) and cost savings (up to 9x) over state-of-the-art serverless and server-based baselines, validating SQUASH’s practical impact for elastic, hybrid vector search workloads.

Abstract

Vector similarity search presents significant challenges in terms of scalability for large and high-dimensional datasets, as well as in providing native support for hybrid queries. Serverless computing and cloud functions offer attractive benefits such as elasticity and cost-effectiveness, but are difficult to apply to data-intensive workloads. Jointly addressing these two main challenges, we present SQUASH, the first fully serverless vector search solution with rich support for hybrid queries. It features OSQ, an optimized and highly parallelizable quantization-based approach for vectors and attributes. Its segment-based storage mechanism enables significant compression in resource-constrained settings and offers efficient dimensional extraction operations. SQUASH performs a single distributed pass to guarantee the return of sufficiently many vectors satisfying the filter predicate, achieving high accuracy and avoiding redundant computation for vectors which fail the predicate. A multi-level search workflow is introduced to prune most vectors early to minimize the load on Function-as-a-Service (FaaS) instances. SQUASH is designed to identify and utilize retention of relevant data in re-used runtime containers, which eliminates redundant I/O and reduces costs. Finally, we demonstrate a new tree-based method for rapid FaaS invocation, enabling the bi-directional flow of data via request/response payloads. Experiments comparing SQUASH with state-of-the-art serverless vector search solutions and server-based baselines on vector search benchmarks confirm significant performance improvements at a lower cost.

SQUASH: Serverless and Distributed Quantization-based Attributed Vector Similarity Search

TL;DR

Abstract

SQUASH: Serverless and Distributed Quantization-based Attributed Vector Similarity Search

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)