Welfarist Formulations for Diverse Similarity Search

Siddharth Barman; Nirjhar Das; Shivam Gupta; Kirankumar Shiragur

Welfarist Formulations for Diverse Similarity Search

Siddharth Barman, Nirjhar Das, Shivam Gupta, Kirankumar Shiragur

TL;DR

This work introduces a welfare-based framework for diversity in nearest neighbor search (NNS), modeling attributes as agents and utilities as query-vector similarities to form Nash social welfare NSW-based objectives. It generalizes to a family of p-mean welfare objectives that interpolate between pure relevance and diversity, and provides algorithms with provable guarantees: a single-attribute NSW-optimizing greedy method that works with exact or approximate ANN, and a multi-attribute greedy approach with a (1-1/e) guarantee on the log-NSW objective despite NP-hardness. The approach avoids fixed quota constraints inherent in prior work and yields query-dependent trade-offs between relevance and diversity, as demonstrated on real and semi-synthetic datasets. The results show NSW and p-mean welfare effectively balance diversity and relevance, with practical runtime characteristics and variants like the p-FetchUnion-ANN heuristic offering substantial speedups. Overall, the paper provides a principled, flexible, and scalable solution to diverse NNS, with strong theoretical guarantees and empirical validation across single- and multi-attribute settings.

Abstract

Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this paper, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions -- from mathematical economics -- that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior constraint-based approach, which forced a fixed level of diversity and optimized for relevance. In addition, our formulation provides a parametric way to control the trade-off between relevance and diversity, providing practitioners with flexibility to tailor search results to task-specific requirements. We develop efficient nearest neighbor algorithms with provable guarantees for the welfare-based objectives. Notably, our algorithm can be applied on top of any standard ANN method (i.e., use standard ANN method as a subroutine) to efficiently find neighbors that approximately maximize our welfare-based objectives. Experimental results demonstrate that our approach is practical and substantially improves diversity while maintaining high relevance of the retrieved neighbors.

Welfarist Formulations for Diverse Similarity Search

TL;DR

Abstract

Paper Structure (35 sections, 16 theorems, 41 equations, 28 figures, 9 tables, 3 algorithms)

This paper contains 35 sections, 16 theorems, 41 equations, 28 figures, 9 tables, 3 algorithms.

Introduction
Problem Formulation and Main Results
Our Results
Diversity in Single- and Multi-Attribute Settings.
Algorithmic Results for Single-Attribute NaNNS and $p$-NNS.
Algorithmic Results for Multi-Attribute NaNNS.
Experimental Validation of our Formulation and Algorithms.
NaNNS in the Single-Attribute Setting
Proofs of Theorem \ref{['theorem:single-attribute']} and Corollary \ref{['theorem:single-attribute-approx-oracle']}
NaNNS in the Multi-Attribute Setting
Proof of \ref{['theorem:multi-attribute-hardness']}
Algorithm for the Multi-Attribute Setting
Proof of \ref{['theorem:multi-attribute-greedy-guarantee']}
Experimental Evaluations
Metrics for Measuring Relevance and Diversity
...and 20 more sections

Key Result

Theorem 1

In the single-attribute setting, given any query $q \in \mathbb{R}^d$ and an (exact) oracle ENN for $k$ most similar vectors from any set, Algorithm algo:greedy-nash-ann (Nash-ANN) returns an optimal solution for NaNNS, i.e., it returns a size-$k$ subset $\textsc{Alg} \subseteq P$ that satisfies $\t

Figures (28)

Figure 1: Neighbor search results ($k=9$) on the Amazon dataset. From left: First and Second images - ANN and Nash-based results for query "shirts", respectively. Third and Fourth images - ANN and Nash-based results for query "blue shirt", respectively. Note that the Nash-based method selects diverse colors for the query "shirts" but conforms to the blue color for the query "blue shirt".
Figure 4: Columns 1 and 2 - Comparison of approximation ratio versus entropy trade-offs between Nash-ANN, and Div-ANN with varying $k'$, for $k$ = $50$ on Amazon and Deep1b- (Clus) datasets in the single-attribute setting. Columns 3 and 4 - Comparison of approximation ratio versus entropy trade-offs (across attribute classes $C_1$ and $C_2$) between Multi Nash-ANN, and Multi Div-ANN with varying $k'$ on Sift1m- (Clus) dataset with $k=50$ in the multi-attribute setting.
Figure 5: Columns 1 and 2 - Approximation ratio versus entropy trade-offs for $p$-mean-ANN at various $p$ values, for $k$ = $50$ on Amazon and Deep1b- (Clus) datasets in the single-attribute setting. Columns 3 and 4 - Approximation ratio versus entropy trade-offs (across attribute classes $C_1$ and $C_2$) for Multi p-mean-ANN with varying $p$ on Sift1m- (Clus) dataset with $k$=$50$ in the multi-attribute setting.
Figure 6: The plots show approximation ratio versus entropy trade-offs for various algorithms for $k$ = $10$(Left) and $k$ = $50$(Right) in single-attribute setting on Sift1m- (Clus) dataset.
Figure 7: The plots show approximation ratio versus entropy trade-offs for various algorithms for $k$ = $10$(Left) and $k$ = $50$(Right) in single-attribute setting on Sift1m- (Prob) dataset.
...and 23 more figures

Theorems & Definitions (37)

Definition 1: NaNNS
Example 1: Complete Diversity via NaNNS
Example 2: Complete Relevance via NaNNS
Definition 2: $p$-NNS
Theorem 1
Corollary 1
Theorem 2
Theorem 3
Lemma 4: Decreasing Marginals
Lemma 5
...and 27 more

Welfarist Formulations for Diverse Similarity Search

TL;DR

Abstract

Welfarist Formulations for Diverse Similarity Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (28)

Theorems & Definitions (37)