Table of Contents
Fetching ...

Ranking Items from Discrete Ratings: The Cost of Unknown User Thresholds

Oscar Villemaud, Suryanarayana Sankagiri, Matthias Grossglauser

TL;DR

This paper investigates how hard it is to derive fine-grained item rankings from coarse, discrete ratings when each user applies a latent threshold to a common item score. Modeling items by scores $X_i$ and users by thresholds $Y_u$, the authors show that even with sequential querying, exact ranking is impossible in expectation, and they quantify partial ranking quality using the Maximum Spearman Footrule (MSF). They prove lower bounds showing that near-perfect ranking requires $\Omega(n^2)$ users in the linear-to-quadratic regime and provide a near-tight algorithm, Threshold Binary Search (TBS), achieving $\mathcal{O}(n\log(m) + m\log(n))$ queries up to log factors. The results highlight a fundamental cost of discretization: threshold diversity is essential to merge coarse ratings into a fine ranking, and mismatches between score and threshold distributions (captured by a quadratic divergence) worsen the complexity. Empirical results on Beta-distributed scores and thresholds corroborate the theory in both linear and quadratic regimes and illustrate the practical implications for choosing between ratings and comparisons in ranking systems.

Abstract

Ranking items is a central task in many information retrieval and recommender systems. User input for the ranking task often comes in the form of ratings on a coarse discrete scale. We ask whether it is possible to recover a fine-grained item ranking from such coarse-grained ratings. We model items as having scores and users as having thresholds; a user rates an item positively if the item's score exceeds the user's threshold. Although all users agree on the total item order, estimating that order is challenging when both the scores and the thresholds are latent. Under our model, any ranking method naturally partitions the $n$ items into bins; the bins are ordered, but the items inside each bin are still unordered. Users arrive sequentially, and every new user can be queried to refine the current ranking. We prove that achieving a near-perfect ranking, measured by Spearman distance, requires $Θ(n^2)$ users (and therefore $Ω(n^2)$ queries). This is significantly worse than the $O(n\log n)$ queries needed to rank from comparisons; the gap reflects the additional queries needed to identify the users who have the appropriate thresholds. Our bound also quantifies the impact of a mismatch between score and threshold distributions via a quadratic divergence factor. To show the tightness of our results, we provide a ranking algorithm whose query complexity matches our bound up to a logarithmic factor. Our work reveals a tension in online ranking: diversity in thresholds is necessary to merge coarse ratings from many users into a fine-grained ranking, but this diversity has a cost if the thresholds are a priori unknown.

Ranking Items from Discrete Ratings: The Cost of Unknown User Thresholds

TL;DR

This paper investigates how hard it is to derive fine-grained item rankings from coarse, discrete ratings when each user applies a latent threshold to a common item score. Modeling items by scores and users by thresholds , the authors show that even with sequential querying, exact ranking is impossible in expectation, and they quantify partial ranking quality using the Maximum Spearman Footrule (MSF). They prove lower bounds showing that near-perfect ranking requires users in the linear-to-quadratic regime and provide a near-tight algorithm, Threshold Binary Search (TBS), achieving queries up to log factors. The results highlight a fundamental cost of discretization: threshold diversity is essential to merge coarse ratings into a fine ranking, and mismatches between score and threshold distributions (captured by a quadratic divergence) worsen the complexity. Empirical results on Beta-distributed scores and thresholds corroborate the theory in both linear and quadratic regimes and illustrate the practical implications for choosing between ratings and comparisons in ranking systems.

Abstract

Ranking items is a central task in many information retrieval and recommender systems. User input for the ranking task often comes in the form of ratings on a coarse discrete scale. We ask whether it is possible to recover a fine-grained item ranking from such coarse-grained ratings. We model items as having scores and users as having thresholds; a user rates an item positively if the item's score exceeds the user's threshold. Although all users agree on the total item order, estimating that order is challenging when both the scores and the thresholds are latent. Under our model, any ranking method naturally partitions the items into bins; the bins are ordered, but the items inside each bin are still unordered. Users arrive sequentially, and every new user can be queried to refine the current ranking. We prove that achieving a near-perfect ranking, measured by Spearman distance, requires users (and therefore queries). This is significantly worse than the queries needed to rank from comparisons; the gap reflects the additional queries needed to identify the users who have the appropriate thresholds. Our bound also quantifies the impact of a mismatch between score and threshold distributions via a quadratic divergence factor. To show the tightness of our results, we provide a ranking algorithm whose query complexity matches our bound up to a logarithmic factor. Our work reveals a tension in online ranking: diversity in thresholds is necessary to merge coarse ratings from many users into a fine-grained ranking, but this diversity has a cost if the thresholds are a priori unknown.

Paper Structure

This paper contains 57 sections, 29 theorems, 152 equations, 4 figures, 4 algorithms.

Key Result

Lemma 1

Let $X_1, \ldots, X_n$ be iid item scores of density $f_X$ on $[0,1]$. Let $M$ be the random number of users needed to obtain a total order. Then we have:

Figures (4)

  • Figure 1: The item scores and user thresholds are sampled iid, respectively with densities $f_X$ and $f_Y$.
  • Figure 2: Experiments on the expected MSF and number of queries for a number of users linear in the number of items. The full lines on Figure (a) are the theoretical values of Theorem \ref{['thm:msf_high']}. Figure (b) shows the empirical query cost of $\texttt{TBS}$. The label 'mis' indicates the experiments with a mismatch between the distributions of the items and the thresholds. We use $a_X=2, b_X=3, a_Y=2, b_Y=2$ for the mismatch case, and $a_X=1, b_X=1, a_Y=1, b_Y=1$ in the default case. On Figure (b), confidence intervals are too small to display, and lines for the mismatch case overlap with their counterparts for the matching case.
  • Figure 3: Experiments on the expected MSF and number of queries for a number of users quadratic in the number of items. The full lines on Figure (a) are the theoretical values of Theorem \ref{['thm:msf_low']}. Figure (b) shows the empirical query cost of $\texttt{TBS}$. The solid black line shows $y = 2 n^2 \log(n)$, which corresponds to the rate estimated in Section \ref{['sec:algorithm']} (the constant $2$ is manually chosen to roughly match the other lines). The label 'mis' indicates the experiments with a mismatch between the distributions of the items and the thresholds. We use $a_X=2, b_X=3, a_Y=2, b_Y=2$ for the mismatch case, and $a_X=1, b_X=1, a_Y=1, b_Y=1$ in the default case.
  • Figure 4: Illustration of the definitions for $u$ fixed. We omit the dependency in $u$ on the figure for readability. All random variables here depend on $u$.

Theorems & Definitions (63)

  • Lemma 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Theorem 2
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 53 more