Table of Contents
Fetching ...

How to Mine Potentially Popular Items? A Reverse MIPS-based Approach

Daichi Amagata, Kazuyoshi Aoayama, Keito Kido, Sumio Fujita

TL;DR

This work addresses finding the top-$N$ items with the largest reverse $k$-MIPS score, enabling item-centric popularity analysis. It introduces a novel exact algorithm that combines a tight offline upper bound with an efficient online pruning strategy, achieving substantial speedups over baselines. The offline phase computes $uscore_k(p)$ bounds in $O(nm)$ time independent of dimension, and the online phase uses incremental reverse $k$-MIPS to accurately accumulate scores while filtering candidates. Experiments on large real-world datasets demonstrate interactive query times and orders-of-magnitude faster performance than competing methods, validating the approach's practicality for scalable item popularity analysis. The method enables robust, item-centric insights for recommender systems, market analysis, and new-item development by efficiently identifying potentially popular items early in the pipeline.

Abstract

The $k$-MIPS ($k$ Maximum Inner Product Search) problem has been employed in many fields. Recently, its reverse version, the reverse $k$-MIPS problem, has been proposed. Given an item vector (i.e., query), it retrieves all user vectors such that their $k$-MIPS results contain the item vector. Consider the cardinality of a reverse $k$-MIPS result. A large cardinality means that the item is potentially popular, because it is included in the $k$-MIPS results of many users. This mining is important in recommender systems, market analysis, and new item development. Motivated by this, we formulate a new problem. In this problem, the score of each item is defined as the cardinality of its reverse $k$-MIPS result, and the $N$ items with the highest score are retrieved. A straightforward approach is to compute the scores of all items, but this is clearly prohibitive for large numbers of users and items. We remove this inefficiency issue and propose a fast algorithm for this problem. Because the main bottleneck of the problem is to compute the score of each item, we devise a new upper-bounding technique that is specific to our problem and filters unnecessary score computations. We conduct extensive experiments on real datasets and show the superiority of our algorithm over competitors.

How to Mine Potentially Popular Items? A Reverse MIPS-based Approach

TL;DR

This work addresses finding the top- items with the largest reverse -MIPS score, enabling item-centric popularity analysis. It introduces a novel exact algorithm that combines a tight offline upper bound with an efficient online pruning strategy, achieving substantial speedups over baselines. The offline phase computes bounds in time independent of dimension, and the online phase uses incremental reverse -MIPS to accurately accumulate scores while filtering candidates. Experiments on large real-world datasets demonstrate interactive query times and orders-of-magnitude faster performance than competing methods, validating the approach's practicality for scalable item popularity analysis. The method enables robust, item-centric insights for recommender systems, market analysis, and new-item development by efficiently identifying potentially popular items early in the pipeline.

Abstract

The -MIPS ( Maximum Inner Product Search) problem has been employed in many fields. Recently, its reverse version, the reverse -MIPS problem, has been proposed. Given an item vector (i.e., query), it retrieves all user vectors such that their -MIPS results contain the item vector. Consider the cardinality of a reverse -MIPS result. A large cardinality means that the item is potentially popular, because it is included in the -MIPS results of many users. This mining is important in recommender systems, market analysis, and new item development. Motivated by this, we formulate a new problem. In this problem, the score of each item is defined as the cardinality of its reverse -MIPS result, and the items with the highest score are retrieved. A straightforward approach is to compute the scores of all items, but this is clearly prohibitive for large numbers of users and items. We remove this inefficiency issue and propose a fast algorithm for this problem. Because the main bottleneck of the problem is to compute the score of each item, we devise a new upper-bounding technique that is specific to our problem and filters unnecessary score computations. We conduct extensive experiments on real datasets and show the superiority of our algorithm over competitors.

Paper Structure

This paper contains 23 sections, 2 theorems, 8 equations, 8 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

The space and time complexities of Algorithm algo:pre-processing are $O((n+m))d)$ and $O(nm)$, respectively.

Figures (8)

  • Figure 1: Illustration of Example \ref{['example:mips']}
  • Figure 2: Relationship between $x_{i}$ and ($r_{i} + 1 - B_{1} / n$)
  • Figure 3: Concrete examples of the result of Equation (\ref{['eq:f_x']})
  • Figure 4: Score distribution of each dataset
  • Figure 5: Impact of $N$: "$\times$" shows LEMP, "$\circ$" shows FEXIPRO, "$\triangle$" shows Simpfer, and "$\triangledown$" shows Ours.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 1: $k$-MIPS problem
  • Definition 2: Reverse $k$-MIPS problem
  • Definition 3: Top-$N$ item search based on reverse $k$-MIPS result size
  • Example 1
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Remark 3