Table of Contents
Fetching ...

Multi-Attribute Group Fairness in $k$-NN Queries on Vector Databases

Thinh On, Senjuti Basu Roy, Baruch Schieber

TL;DR

A computational framework that produces high-quality approximate nearest neighbors with good trade-offs between search time, memory/indexing cost, and recall and theoretical guarantees, identify efficiency--fairness trade-offs, and empirically show that existing vector search methods cannot be directly adapted for fairness are provided.

Abstract

We initiate the study of multi-attribute group fairness in $k$-nearest neighbor ($k$-NN) search over vector databases. Unlike prior work that optimizes efficiency or query filtering, fairness imposes count constraints to ensure proportional representation across groups defined by protected attributes. When fairness spans multiple attributes, these constraints must be satisfied simultaneously, making the problem computationally hard. To address this, we propose a computational framework that produces high-quality approximate nearest neighbors with good trade-offs between search time, memory/indexing cost, and recall. We adapt locality-sensitive hashing (LSH) to accelerate candidate generation and build a lightweight index over the Cartesian product of protected attribute values. Our framework retrieves candidates satisfying joint count constraints and then applies a post-processing stage to construct fair $k$-NN results across all attributes. For 2 attributes, we present an exact polynomial-time flow-based algorithm; for 3 or more, we formulate ILP-based exact solutions with higher computational cost. We provide theoretical guarantees, identify efficiency--fairness trade-offs, and empirically show that existing vector search methods cannot be directly adapted for fairness. Experimental evaluations demonstrate the generality of the proposed framework and scalability.

Multi-Attribute Group Fairness in $k$-NN Queries on Vector Databases

TL;DR

A computational framework that produces high-quality approximate nearest neighbors with good trade-offs between search time, memory/indexing cost, and recall and theoretical guarantees, identify efficiency--fairness trade-offs, and empirically show that existing vector search methods cannot be directly adapted for fairness are provided.

Abstract

We initiate the study of multi-attribute group fairness in -nearest neighbor (-NN) search over vector databases. Unlike prior work that optimizes efficiency or query filtering, fairness imposes count constraints to ensure proportional representation across groups defined by protected attributes. When fairness spans multiple attributes, these constraints must be satisfied simultaneously, making the problem computationally hard. To address this, we propose a computational framework that produces high-quality approximate nearest neighbors with good trade-offs between search time, memory/indexing cost, and recall. We adapt locality-sensitive hashing (LSH) to accelerate candidate generation and build a lightweight index over the Cartesian product of protected attribute values. Our framework retrieves candidates satisfying joint count constraints and then applies a post-processing stage to construct fair -NN results across all attributes. For 2 attributes, we present an exact polynomial-time flow-based algorithm; for 3 or more, we formulate ILP-based exact solutions with higher computational cost. We provide theoretical guarantees, identify efficiency--fairness trade-offs, and empirically show that existing vector search methods cannot be directly adapted for fairness. Experimental evaluations demonstrate the generality of the proposed framework and scalability.
Paper Structure (38 sections, 9 theorems, 17 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 38 sections, 9 theorems, 17 equations, 9 figures, 3 tables, 2 algorithms.

Key Result

theorem 1

3+-Fair-KNN: The problem of deciding the feasibility of a general instance of the 3-attribute case (and thus any $m\geq 3$ attributes as well) is strongly NP-hard.

Figures (9)

  • Figure 1: Preprocessing vectors: partitioning by Cartesian product of attribute values and locality-sensitive hashing. For each partition $\pi$, we employ $\ell$ concatenated hash functions, each of length $\mu$, to form $\ell$ hash tables.
  • Figure 2: An example bipartite graph for two-attribute case. To satisfy fairness constraints, we select $k$ edges from $E$ that respect the $\hat{\beta}$ values demanded by Source and Sink. Note that it is possible to have multiple parallel edges for each pair $(v, t)$, one per data point in the candidate set $\mathcal{C}$, as shown for the pair $(v, t) = (\text{Male, Asian})$ where there are 3 candidates being Male Asian.
  • Figure 3: Average DAF varying (a) $k$ on Audio dataset, and (b) $m$ on CelebA dataset. Zero DAF means infeasible solutions.
  • Figure 4: Comparing different fairness based baselines. (a) recall@$k$ with varying $k$, (b) recall@$k$ with varying $m$, (c) successful queries (%) with varying $k$, (d) successful queries (%) with varying $m$. We vary $k$ on FairFace, $m$ on CelebA datasets, and set $\ell=128$ for the LSH-based methods.
  • Figure 5: Qualitative evaluation varying LSH parameters using FairFace dataset. (a) recall@$k$ varying $w$, (b) recall@$k$ varying $\ell$, (c) successful queries (%) varying $w$, (d) successful queries (%) varying $\ell$.
  • ...and 4 more figures

Theorems & Definitions (9)

  • theorem 1
  • theorem 2: Number of Hash Tables in Alg-Near-$\pi$
  • theorem 3: Expected Number of False Positives.
  • theorem 4: Running Time of Alg-Near-$\pi$
  • corollary 1: Running Time of Alg-Near-Neighbor
  • theorem 5
  • theorem 6
  • theorem 7
  • theorem 8