Table of Contents
Fetching ...

The Benefits of Diversity: Combining Comparisons and Ratings for Efficient Scoring

Julien Fageot, Matthias Grossglauser, Lê-Nguyên Hoang, Matteo Tacchi-Bénard, Oscar Villemaud

TL;DR

This work addresses how to elicit human preferences most efficiently by unifying direct ratings and pairwise comparisons in a single probabilistic framework. The proposed SCoRa model jointly reasons over embeddings, comparisons, and ratings using a generalized Bradley–Terry formulation with a learnable threshold, and it provides MAP guarantees, monotonicity, and Lipschitz resilience. The authors demonstrate convergence and robustness on synthetic data and uncover realistic regimes where mixing ratings and comparisons yields superior top-item scoring, particularly when active learning prioritizes comparisons among top entities. The findings offer a flexible, scalable basis for preference learning in applications like content recommendation and model alignment, with implications for how to allocate user effort across feedback types.

Abstract

Should humans be asked to evaluate entities individually or comparatively? This question has been the subject of long debates. In this work, we show that, interestingly, combining both forms of preference elicitation can outperform the focus on a single kind. More specifically, we introduce SCoRa (Scoring from Comparisons and Ratings), a unified probabilistic model that allows to learn from both signals. We prove that the MAP estimator of SCoRa is well-behaved. It verifies monotonicity and robustness guarantees. We then empirically show that SCoRa recovers accurate scores, even under model mismatch. Most interestingly, we identify a realistic setting where combining comparisons and ratings outperforms using either one alone, and when the accurate ordering of top entities is critical. Given the de facto availability of signals of multiple forms, SCoRa additionally offers a versatile foundation for preference learning.

The Benefits of Diversity: Combining Comparisons and Ratings for Efficient Scoring

TL;DR

This work addresses how to elicit human preferences most efficiently by unifying direct ratings and pairwise comparisons in a single probabilistic framework. The proposed SCoRa model jointly reasons over embeddings, comparisons, and ratings using a generalized Bradley–Terry formulation with a learnable threshold, and it provides MAP guarantees, monotonicity, and Lipschitz resilience. The authors demonstrate convergence and robustness on synthetic data and uncover realistic regimes where mixing ratings and comparisons yields superior top-item scoring, particularly when active learning prioritizes comparisons among top entities. The findings offer a flexible, scalable basis for preference learning in applications like content recommendation and model alignment, with implications for how to allocate user effort across feedback types.

Abstract

Should humans be asked to evaluate entities individually or comparatively? This question has been the subject of long debates. In this work, we show that, interestingly, combining both forms of preference elicitation can outperform the focus on a single kind. More specifically, we introduce SCoRa (Scoring from Comparisons and Ratings), a unified probabilistic model that allows to learn from both signals. We prove that the MAP estimator of SCoRa is well-behaved. It verifies monotonicity and robustness guarantees. We then empirically show that SCoRa recovers accurate scores, even under model mismatch. Most interestingly, we identify a realistic setting where combining comparisons and ratings outperforms using either one alone, and when the accurate ordering of top entities is critical. Given the de facto availability of signals of multiple forms, SCoRa additionally offers a versatile foundation for preference learning.
Paper Structure (50 sections, 13 theorems, 54 equations, 5 figures, 1 table)

This paper contains 50 sections, 13 theorems, 54 equations, 5 figures, 1 table.

Key Result

Proposition 2.2

The loss admits the explicit form Moreover, $\mathcal{L}$ is $\sigma_{\max}^{-2}$-strongly convex, where $\sigma_{\max}^2 = \max \left\lbrace \sigma_\beta^2, \sigma_0^2 \right\rbrace$. In particular, there exists a unique MAP.

Figures (5)

  • Figure 1: Convergence of the recovered scores irrespective of the mix, even under model mismatch. We see that the correlation goes to one as the the budget increases, whether we use only comparisons, only ratings, or a mix of the two. Uniform root laws are used to generate the data of both plots. On the left, uniform root laws are also used for the inference, whereas gaussian root laws are used for inference on the right. We use parameters $k_r=k_c=\infty$, $c_c=c_r=1$ and $p_c \in \{0, 0.5, 1\}$.
  • Figure 2: Weighted Correlation for $c_c=3$, $c_r=1$, $k_r=k_c=\infty$ and $\mathbf{b}\in \{500, 1000, 1500\}$.
  • Figure 3: Weighted Correlation for $c_c=8$, $c_r=1$, $k_r=k_c=2$ and $\mathbf{b}\in \{5000, 10000, 20000\}$.
  • Figure 4: Weighted Correlation for $c_c=8$, $c_r=1$, $k_c=k_r=2$. and $\mathbf{b} \in \{10^4, 2.10^4, 10^5\}$. We use one-hot-encoded embeddings.
  • Figure 5: Comparison of active learning when the first phase is done with ratings to active learning when the first phase is done with comparisons. We use parameters $k_r=5$, $k_c=2$, $\mathbf{b}=10000$, $c_r=1$, $c_c=3$ and one-hot-encoded embeddings.

Theorems & Definitions (37)

  • Definition 2.1
  • Proposition 2.2
  • proof
  • Definition 2.3
  • Definition 2.4: Flexible linear GBT model
  • Theorem 2.5
  • proof
  • Remark 2.6
  • Proposition 3.1
  • proof
  • ...and 27 more