Table of Contents
Fetching ...

Welfarist Formulations for Diverse Similarity Search

Siddharth Barman, Nirjhar Das, Shivam Gupta, Kirankumar Shiragur

TL;DR

This work introduces a welfare-based framework for diversity in nearest neighbor search (NNS), modeling attributes as agents and utilities as query-vector similarities to form Nash social welfare NSW-based objectives. It generalizes to a family of p-mean welfare objectives that interpolate between pure relevance and diversity, and provides algorithms with provable guarantees: a single-attribute NSW-optimizing greedy method that works with exact or approximate ANN, and a multi-attribute greedy approach with a (1-1/e) guarantee on the log-NSW objective despite NP-hardness. The approach avoids fixed quota constraints inherent in prior work and yields query-dependent trade-offs between relevance and diversity, as demonstrated on real and semi-synthetic datasets. The results show NSW and p-mean welfare effectively balance diversity and relevance, with practical runtime characteristics and variants like the p-FetchUnion-ANN heuristic offering substantial speedups. Overall, the paper provides a principled, flexible, and scalable solution to diverse NNS, with strong theoretical guarantees and empirical validation across single- and multi-attribute settings.

Abstract

Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this paper, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions -- from mathematical economics -- that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior constraint-based approach, which forced a fixed level of diversity and optimized for relevance. In addition, our formulation provides a parametric way to control the trade-off between relevance and diversity, providing practitioners with flexibility to tailor search results to task-specific requirements. We develop efficient nearest neighbor algorithms with provable guarantees for the welfare-based objectives. Notably, our algorithm can be applied on top of any standard ANN method (i.e., use standard ANN method as a subroutine) to efficiently find neighbors that approximately maximize our welfare-based objectives. Experimental results demonstrate that our approach is practical and substantially improves diversity while maintaining high relevance of the retrieved neighbors.

Welfarist Formulations for Diverse Similarity Search

TL;DR

This work introduces a welfare-based framework for diversity in nearest neighbor search (NNS), modeling attributes as agents and utilities as query-vector similarities to form Nash social welfare NSW-based objectives. It generalizes to a family of p-mean welfare objectives that interpolate between pure relevance and diversity, and provides algorithms with provable guarantees: a single-attribute NSW-optimizing greedy method that works with exact or approximate ANN, and a multi-attribute greedy approach with a (1-1/e) guarantee on the log-NSW objective despite NP-hardness. The approach avoids fixed quota constraints inherent in prior work and yields query-dependent trade-offs between relevance and diversity, as demonstrated on real and semi-synthetic datasets. The results show NSW and p-mean welfare effectively balance diversity and relevance, with practical runtime characteristics and variants like the p-FetchUnion-ANN heuristic offering substantial speedups. Overall, the paper provides a principled, flexible, and scalable solution to diverse NNS, with strong theoretical guarantees and empirical validation across single- and multi-attribute settings.

Abstract

Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this paper, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions -- from mathematical economics -- that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior constraint-based approach, which forced a fixed level of diversity and optimized for relevance. In addition, our formulation provides a parametric way to control the trade-off between relevance and diversity, providing practitioners with flexibility to tailor search results to task-specific requirements. We develop efficient nearest neighbor algorithms with provable guarantees for the welfare-based objectives. Notably, our algorithm can be applied on top of any standard ANN method (i.e., use standard ANN method as a subroutine) to efficiently find neighbors that approximately maximize our welfare-based objectives. Experimental results demonstrate that our approach is practical and substantially improves diversity while maintaining high relevance of the retrieved neighbors.
Paper Structure (35 sections, 16 theorems, 41 equations, 28 figures, 9 tables, 3 algorithms)

This paper contains 35 sections, 16 theorems, 41 equations, 28 figures, 9 tables, 3 algorithms.

Key Result

Theorem 1

In the single-attribute setting, given any query $q \in \mathbb{R}^d$ and an (exact) oracle ENN for $k$ most similar vectors from any set, Algorithm algo:greedy-nash-ann (Nash-ANN) returns an optimal solution for NaNNS, i.e., it returns a size-$k$ subset $\textsc{Alg} \subseteq P$ that satisfies $\t

Figures (28)

  • Figure 1: Neighbor search results ($k=9$) on the Amazon dataset. From left: First and Second images - ANN and Nash-based results for query "shirts", respectively. Third and Fourth images - ANN and Nash-based results for query "blue shirt", respectively. Note that the Nash-based method selects diverse colors for the query "shirts" but conforms to the blue color for the query "blue shirt".
  • Figure 4: Columns 1 and 2 - Comparison of approximation ratio versus entropy trade-offs between Nash-ANN, and Div-ANN with varying $k'$, for $k$ = $50$ on Amazon and Deep1b- (Clus) datasets in the single-attribute setting. Columns 3 and 4 - Comparison of approximation ratio versus entropy trade-offs (across attribute classes $C_1$ and $C_2$) between Multi Nash-ANN, and Multi Div-ANN with varying $k'$ on Sift1m- (Clus) dataset with $k=50$ in the multi-attribute setting.
  • Figure 5: Columns 1 and 2 - Approximation ratio versus entropy trade-offs for $p$-mean-ANN at various $p$ values, for $k$ = $50$ on Amazon and Deep1b- (Clus) datasets in the single-attribute setting. Columns 3 and 4 - Approximation ratio versus entropy trade-offs (across attribute classes $C_1$ and $C_2$) for Multi p-mean-ANN with varying $p$ on Sift1m- (Clus) dataset with $k$=$50$ in the multi-attribute setting.
  • Figure 6: The plots show approximation ratio versus entropy trade-offs for various algorithms for $k$ = $10$(Left) and $k$ = $50$(Right) in single-attribute setting on Sift1m- (Clus) dataset.
  • Figure 7: The plots show approximation ratio versus entropy trade-offs for various algorithms for $k$ = $10$(Left) and $k$ = $50$(Right) in single-attribute setting on Sift1m- (Prob) dataset.
  • ...and 23 more figures

Theorems & Definitions (37)

  • Definition 1: NaNNS
  • Example 1: Complete Diversity via NaNNS
  • Example 2: Complete Relevance via NaNNS
  • Definition 2: $p$-NNS
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Lemma 4: Decreasing Marginals
  • Lemma 5
  • ...and 27 more