Approximate Reverse $k$-Ranks Queries in High Dimensions
Daichi Amagata, Kazuyoshi Aoyama, Keito Kido, Sumio Fujita
TL;DR
The paper tackles reverse $k$-ranks queries in high-dimensional inner-product spaces by introducing a $c$-approximate variant and a novel rank-table based algorithm. By constructing per-user lower and upper bounds on ranks via a rank-table and thresholded inner-products, it enables pruning and interpolation that reduce online processing to $O(nd)$ time, avoiding the $O(nmd)$ cost of prior approaches. The proposed offline preprocessing uses norm-based bucketing and random sampling to estimate rank-table entries efficiently, achieving faster setup than the leading method QSRP. Empirical results on real-world datasets show substantial speedups, strong accuracy, and robustness to $k$ and $c$, highlighting practical value for item-centric recommendation and targeted search in high dimensions.
Abstract
Many objects are represented as high-dimensional vectors nowadays. In this setting, the relevance between two objects (vectors) is usually evaluated by their inner product. Recently, item-centric searches, which search for users relevant to query items, have received attention and find important applications, such as product promotion and market analysis. To support these applications, this paper considers reverse $k$-ranks queries. Given a query vector $\mathbf{q}$, $k$, a set $\mathbf{U}$ of user vectors, and a set $\mathbf{P}$ of item vectors, this query retrieves the $k$ user vectors $\mathbf{u} \in \mathbf{U}$ with the highest $r(\mathbf{q},\mathbf{u},\mathbf{P})$, where $r(\mathbf{q},\mathbf{u},\mathbf{P})$ shows the rank of $\mathbf{q}$ for $\mathbf{u}$ among $\mathbf{P}$. Because efficiently computing the exact answer for this query is difficult in high dimensions, we address the problem of approximate reverse $k$-ranks queries. Informally, given an approximation factor $c$, this problem allows, as an output, a user $\mathbf{u}'$ such that $r(\mathbf{q},\mathbf{u}',\mathbf{P}) > τ$ but $r(\mathbf{q},\mathbf{u}',\mathbf{P}) \leq c \times τ$, where $τ$ is the rank threshold for the exact answer. We propose a new algorithm for solving this problem efficiently. Through theoretical and empirical analyses, we confirm the efficiency and effectiveness of our algorithm.
