Table of Contents
Fetching ...

V3DB: Audit-on-Demand Zero-Knowledge Proofs for Verifiable Vector Search over Committed Snapshots

Zipeng Qiu, Wenjie Qu, Jiaheng Zhang, Binhang Yuan

TL;DR

V3DB is presented, a verifiable, versioned vector-search service that enables audit-on-demand correctness checks for approximate nearest-neighbour (ANN) retrieval executed by a potentially untrusted service provider.

Abstract

Dense retrieval services increasingly underpin semantic search, recommendation, and retrieval-augmented generation, yet clients typically receive only a top-$k$ list with no auditable evidence of how it was produced. We present V3DB, a verifiable, versioned vector-search service that enables audit-on-demand correctness checks for approximate nearest-neighbour (ANN) retrieval executed by a potentially untrusted service provider. V3DB commits to each corpus snapshot and standardises an IVF-PQ search pipeline into a fixed-shape, five-step query semantics. Given a public snapshot commitment and a query embedding, the service returns the top-$k$ payloads and, when challenged, produces a succinct zero-knowledge proof that the output is exactly the result of executing the published semantics on the committed snapshot -- without revealing the embedding corpus or private index contents. To make proving practical, V3DB avoids costly in-circuit sorting and random access by combining multiset equality/inclusion checks with lightweight boundary conditions. Our prototype implementation based on Plonky2 achieves up to $22\times$ faster proving and up to $40\%$ lower peak memory consumption than the circuit-only baseline, with millisecond-level verification time. Github Repo at https://github.com/TabibitoQZP/zk-IVF-PQ.

V3DB: Audit-on-Demand Zero-Knowledge Proofs for Verifiable Vector Search over Committed Snapshots

TL;DR

V3DB is presented, a verifiable, versioned vector-search service that enables audit-on-demand correctness checks for approximate nearest-neighbour (ANN) retrieval executed by a potentially untrusted service provider.

Abstract

Dense retrieval services increasingly underpin semantic search, recommendation, and retrieval-augmented generation, yet clients typically receive only a top- list with no auditable evidence of how it was produced. We present V3DB, a verifiable, versioned vector-search service that enables audit-on-demand correctness checks for approximate nearest-neighbour (ANN) retrieval executed by a potentially untrusted service provider. V3DB commits to each corpus snapshot and standardises an IVF-PQ search pipeline into a fixed-shape, five-step query semantics. Given a public snapshot commitment and a query embedding, the service returns the top- payloads and, when challenged, produces a succinct zero-knowledge proof that the output is exactly the result of executing the published semantics on the committed snapshot -- without revealing the embedding corpus or private index contents. To make proving practical, V3DB avoids costly in-circuit sorting and random access by combining multiset equality/inclusion checks with lightweight boundary conditions. Our prototype implementation based on Plonky2 achieves up to faster proving and up to lower peak memory consumption than the circuit-only baseline, with millisecond-level verification time. Github Repo at https://github.com/TabibitoQZP/zk-IVF-PQ.
Paper Structure (82 sections, 38 equations, 2 figures, 9 tables, 2 algorithms)

This paper contains 82 sections, 38 equations, 2 figures, 9 tables, 2 algorithms.

Figures (2)

  • Figure 1: Workflow of V3DB. (i) Index shaping: from the original database $\{(v_t,\mathrm{item}^0_t)\}_{t=0}^{N_0-1}$, build a fixed-shape IVF-PQ snapshot via rebalancing and padding. (ii) Versioned snapshot layer: publish a snapshot identifier $\mathsf{com}\xspace=(\mathrm{root}_{\mathrm{mk}},\mathrm{root}_{\mathrm{cb}})$, where $\mathrm{root}_{\mathrm{mk}}$ is a Merkle root committing to the fixed-shape IVF layout and $\mathrm{root}_{\mathrm{cb}}$ is a hash digest of the PQ codebooks. (iii) Proving backend: on query embedding $q$, return top-$k$ payloads $(\mathrm{item}_0,\dots,\mathrm{item}_{k-1})$ and, upon challenge, a succinct ZK proof $\pi$ that the list equals the output of the fixed-shape IVF-PQ semantics on the snapshot committed by $\mathsf{com}\xspace$, while hiding snapshot contents and the trace.
  • Figure 2: Hierarchical Merkle commitment for the fixed-shape IVF layout. The top-level root $\mathrm{root}_{\mathrm{mk}}$ binds the centroid table $\boldsymbol{\mu}$ and the padded inverted-list slot records.