Diversity-Aware $k$-Maximum Inner Product Search Revisited
Qiang Huang, Yanhao Wang, Yiqun Sun, Anthony K. H. Tung
TL;DR
This work revisits the diversity-aware $k$-Maximum Inner Product Search (D$k$MIPS) by redefining the objective to jointly maximize relevance and diversity within a single inner-product space, using two diversity measures: the average and the maximum pairwise inner products. It introduces two linear-scan algorithms, Greedy and DualGreedy, with data-dependent and $1/4$ approximation guarantees (after regularization for the average objective), and optimizes them to $O(ndk)$ time. To enable real-time performance, the authors integrate a Ball-Cone Tree (BC-Tree) index, yielding BC-Greedy and BC-DualGreedy with effective pruning and competitive query times. Extensive experiments on ten real-world datasets show that the proposed methods deliver more diverse yet relevant results than prior approaches, with BC-Tree variants offering substantial speedups suitable for large-scale applications. Overall, the paper demonstrates that diversity-aware formulations can be made practical and theoretically justified for high-dimensional recommendation tasks.
Abstract
The $k$-Maximum Inner Product Search ($k$MIPS) serves as a foundational component in recommender systems and various data mining tasks. However, while most existing $k$MIPS approaches prioritize the efficient retrieval of highly relevant items for users, they often neglect an equally pivotal facet of search results: \emph{diversity}. To bridge this gap, we revisit and refine the diversity-aware $k$MIPS (D$k$MIPS) problem by incorporating two well-known diversity objectives -- minimizing the average and maximum pairwise item similarities within the results -- into the original relevance objective. This enhancement, inspired by Maximal Marginal Relevance (MMR), offers users a controllable trade-off between relevance and diversity. We introduce \textsc{Greedy} and \textsc{DualGreedy}, two linear scan-based algorithms tailored for D$k$MIPS. They both achieve data-dependent approximations and, when aiming to minimize the average pairwise similarity, \textsc{DualGreedy} attains an approximation ratio of $1/4$ with an additive term for regularization. To further improve query efficiency, we integrate a lightweight Ball-Cone Tree (BC-Tree) index with the two algorithms. Finally, comprehensive experiments on ten real-world data sets demonstrate the efficacy of our proposed methods, showcasing their capability to efficiently deliver diverse and relevant search results to users.
