Welfarist Formulations for Diverse Similarity Search
Siddharth Barman, Nirjhar Das, Shivam Gupta, Kirankumar Shiragur
TL;DR
This work introduces a welfare-based framework for diversity in nearest neighbor search (NNS), modeling attributes as agents and utilities as query-vector similarities to form Nash social welfare NSW-based objectives. It generalizes to a family of p-mean welfare objectives that interpolate between pure relevance and diversity, and provides algorithms with provable guarantees: a single-attribute NSW-optimizing greedy method that works with exact or approximate ANN, and a multi-attribute greedy approach with a (1-1/e) guarantee on the log-NSW objective despite NP-hardness. The approach avoids fixed quota constraints inherent in prior work and yields query-dependent trade-offs between relevance and diversity, as demonstrated on real and semi-synthetic datasets. The results show NSW and p-mean welfare effectively balance diversity and relevance, with practical runtime characteristics and variants like the p-FetchUnion-ANN heuristic offering substantial speedups. Overall, the paper provides a principled, flexible, and scalable solution to diverse NNS, with strong theoretical guarantees and empirical validation across single- and multi-attribute settings.
Abstract
Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this paper, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions -- from mathematical economics -- that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior constraint-based approach, which forced a fixed level of diversity and optimized for relevance. In addition, our formulation provides a parametric way to control the trade-off between relevance and diversity, providing practitioners with flexibility to tailor search results to task-specific requirements. We develop efficient nearest neighbor algorithms with provable guarantees for the welfare-based objectives. Notably, our algorithm can be applied on top of any standard ANN method (i.e., use standard ANN method as a subroutine) to efficiently find neighbors that approximately maximize our welfare-based objectives. Experimental results demonstrate that our approach is practical and substantially improves diversity while maintaining high relevance of the retrieved neighbors.
