Table of Contents
Fetching ...

Diversity-Aware $k$-Maximum Inner Product Search Revisited

Qiang Huang, Yanhao Wang, Yiqun Sun, Anthony K. H. Tung

TL;DR

This work revisits the diversity-aware $k$-Maximum Inner Product Search (D$k$MIPS) by redefining the objective to jointly maximize relevance and diversity within a single inner-product space, using two diversity measures: the average and the maximum pairwise inner products. It introduces two linear-scan algorithms, Greedy and DualGreedy, with data-dependent and $1/4$ approximation guarantees (after regularization for the average objective), and optimizes them to $O(ndk)$ time. To enable real-time performance, the authors integrate a Ball-Cone Tree (BC-Tree) index, yielding BC-Greedy and BC-DualGreedy with effective pruning and competitive query times. Extensive experiments on ten real-world datasets show that the proposed methods deliver more diverse yet relevant results than prior approaches, with BC-Tree variants offering substantial speedups suitable for large-scale applications. Overall, the paper demonstrates that diversity-aware formulations can be made practical and theoretically justified for high-dimensional recommendation tasks.

Abstract

The $k$-Maximum Inner Product Search ($k$MIPS) serves as a foundational component in recommender systems and various data mining tasks. However, while most existing $k$MIPS approaches prioritize the efficient retrieval of highly relevant items for users, they often neglect an equally pivotal facet of search results: \emph{diversity}. To bridge this gap, we revisit and refine the diversity-aware $k$MIPS (D$k$MIPS) problem by incorporating two well-known diversity objectives -- minimizing the average and maximum pairwise item similarities within the results -- into the original relevance objective. This enhancement, inspired by Maximal Marginal Relevance (MMR), offers users a controllable trade-off between relevance and diversity. We introduce \textsc{Greedy} and \textsc{DualGreedy}, two linear scan-based algorithms tailored for D$k$MIPS. They both achieve data-dependent approximations and, when aiming to minimize the average pairwise similarity, \textsc{DualGreedy} attains an approximation ratio of $1/4$ with an additive term for regularization. To further improve query efficiency, we integrate a lightweight Ball-Cone Tree (BC-Tree) index with the two algorithms. Finally, comprehensive experiments on ten real-world data sets demonstrate the efficacy of our proposed methods, showcasing their capability to efficiently deliver diverse and relevant search results to users.

Diversity-Aware $k$-Maximum Inner Product Search Revisited

TL;DR

This work revisits the diversity-aware -Maximum Inner Product Search (DMIPS) by redefining the objective to jointly maximize relevance and diversity within a single inner-product space, using two diversity measures: the average and the maximum pairwise inner products. It introduces two linear-scan algorithms, Greedy and DualGreedy, with data-dependent and approximation guarantees (after regularization for the average objective), and optimizes them to time. To enable real-time performance, the authors integrate a Ball-Cone Tree (BC-Tree) index, yielding BC-Greedy and BC-DualGreedy with effective pruning and competitive query times. Extensive experiments on ten real-world datasets show that the proposed methods deliver more diverse yet relevant results than prior approaches, with BC-Tree variants offering substantial speedups suitable for large-scale applications. Overall, the paper demonstrates that diversity-aware formulations can be made practical and theoretically justified for high-dimensional recommendation tasks.

Abstract

The -Maximum Inner Product Search (MIPS) serves as a foundational component in recommender systems and various data mining tasks. However, while most existing MIPS approaches prioritize the efficient retrieval of highly relevant items for users, they often neglect an equally pivotal facet of search results: \emph{diversity}. To bridge this gap, we revisit and refine the diversity-aware MIPS (DMIPS) problem by incorporating two well-known diversity objectives -- minimizing the average and maximum pairwise item similarities within the results -- into the original relevance objective. This enhancement, inspired by Maximal Marginal Relevance (MMR), offers users a controllable trade-off between relevance and diversity. We introduce \textsc{Greedy} and \textsc{DualGreedy}, two linear scan-based algorithms tailored for DMIPS. They both achieve data-dependent approximations and, when aiming to minimize the average pairwise similarity, \textsc{DualGreedy} attains an approximation ratio of with an additive term for regularization. To further improve query efficiency, we integrate a lightweight Ball-Cone Tree (BC-Tree) index with the two algorithms. Finally, comprehensive experiments on ten real-world data sets demonstrate the efficacy of our proposed methods, showcasing their capability to efficiently deliver diverse and relevant search results to users.
Paper Structure (43 sections, 8 theorems, 10 equations, 9 figures, 5 tables, 4 algorithms)

This paper contains 43 sections, 8 theorems, 10 equations, 9 figures, 5 tables, 4 algorithms.

Key Result

Theorem 1

Let $\mathcal{S}$ be the D$k$MIPS result by Greedy and $\mathcal{S}'$ be the $k$MIPS result for $\bm{q}$. Define $\overline{f}(\mathcal{S}) := \frac{\lambda}{k} \sum_{\bm{p} \in \mathcal{S}} \langle \bm{p}, \bm{q} \rangle$ and $\underline{f}(\mathcal{S}) := \frac{\lambda}{k} \sum_{\bm{p}\in \mathcal

Figures (9)

  • Figure 1: Comparison of the results provided by $k$MIPS and D$k$MIPS on the MovieLens data set when $k = 10$. In Fig. \ref{['fig:motivation:posters']}, we display posters of ten randomly user-rated movies, together with those returned by $k$MIPS (using Linear) and D$k$MIPS (using DualGreedy-Avg). In Fig. \ref{['fig:motivation:hist']}, we present histograms showing the genre distribution of all user-rated movies and those returned by both methods. See Section \ref{['sect:expt:case_study']} for detailed results and analyses.
  • Figure 2: Illustration of point-level ball bound (green dashed line) and point-level cone bound (blue dashed line) for an item vector $\bm{p}$ in the leaf node. From the red triangle, we observe that $(\Vert \bm{p} \Vert \sin\varphi_{\bm{p}})^2 + (\Vert \bm{c} \Vert - \Vert \bm{p} \Vert \cos\varphi_{\bm{p}})^2 = r_{\bm{p}}^2$.
  • Figure 3: Query performance vs. $\lambda$ for the objective function $f_{avg}(\cdot)$ ($k=10$).
  • Figure 4: Query performance vs. $\lambda$ for the objective function $f_{max}(\cdot)$ ($k=10$).
  • Figure 5: Query performance vs. $k$ for the objective function $f_{avg}(\cdot)$ ($\lambda=0.5$).
  • ...and 4 more figures

Theorems & Definitions (12)

  • Definition 1: $k$MIPS
  • Definition 2: D$k$MIPS hirata2022solving
  • Definition 3: D$k$MIPS, revisited
  • Theorem 1
  • Lemma 1
  • Example 1
  • Theorem 2
  • Theorem 3
  • Theorem 4: Node-Level Ball Bound
  • Corollary 1: Point-Level Ball Bound
  • ...and 2 more