Table of Contents
Fetching ...

Carrot and Stick: Eliciting Comparison Data and Beyond

Yiling Chen, Shi Feng, Fang-Yi Yu

TL;DR

A peer prediction mechanisms for eliciting comparison data using a bonus-penalty payment that leverages on the strong stochastic transitivity for comparison data to create symmetrically strongly truthful mechanisms such that truth-telling forms a strict Bayesian Nash equilibrium and yields the highest payment among all symmetric equilibria.

Abstract

Comparison data elicited from people are fundamental to many machine learning tasks, including reinforcement learning from human feedback for large language models and estimating ranking models. They are typically subjective and not directly verifiable. How to truthfully elicit such comparison data from rational individuals? We design peer prediction mechanisms for eliciting comparison data using a bonus-penalty payment. Our design leverages on the strong stochastic transitivity for comparison data to create symmetrically strongly truthful mechanisms such that truth-telling 1) forms a strict Bayesian Nash equilibrium, and 2) yields the highest payment among all symmetric equilibria. Each individual only needs to evaluate one pair of items and report her comparison in our mechanism. We further extend the bonus-penalty payment concept to eliciting networked data, designing a symmetrically strongly truthful mechanism when agents' private signals are sampled according to the Ising models. We provide the necessary and sufficient conditions for our bonus-penalty payment to have truth-telling as a strict Bayesian Nash equilibrium. Experiments on two real-world datasets further support our theoretical discoveries.

Carrot and Stick: Eliciting Comparison Data and Beyond

TL;DR

A peer prediction mechanisms for eliciting comparison data using a bonus-penalty payment that leverages on the strong stochastic transitivity for comparison data to create symmetrically strongly truthful mechanisms such that truth-telling forms a strict Bayesian Nash equilibrium and yields the highest payment among all symmetric equilibria.

Abstract

Comparison data elicited from people are fundamental to many machine learning tasks, including reinforcement learning from human feedback for large language models and estimating ranking models. They are typically subjective and not directly verifiable. How to truthfully elicit such comparison data from rational individuals? We design peer prediction mechanisms for eliciting comparison data using a bonus-penalty payment. Our design leverages on the strong stochastic transitivity for comparison data to create symmetrically strongly truthful mechanisms such that truth-telling 1) forms a strict Bayesian Nash equilibrium, and 2) yields the highest payment among all symmetric equilibria. Each individual only needs to evaluate one pair of items and report her comparison in our mechanism. We further extend the bonus-penalty payment concept to eliciting networked data, designing a symmetrically strongly truthful mechanism when agents' private signals are sampled according to the Ising models. We provide the necessary and sufficient conditions for our bonus-penalty payment to have truth-telling as a strict Bayesian Nash equilibrium. Experiments on two real-world datasets further support our theoretical discoveries.

Paper Structure

This paper contains 29 sections, 15 theorems, 58 equations, 8 figures, 2 tables, 3 algorithms.

Key Result

Proposition 2.3

For any strictly increasing $F$ and non-atomic $\nu$ on $\mathbb{R}$, the parametric model in ex:para is a Bayesian SST model.

Figures (8)

  • Figure 1: SUSHI preference dataset
  • Figure 2: Last.fm dataset for Lady Gaga
  • Figure 3: As fixing any $\underline{\beta}, \overline{\beta}$, we can construct a simple graph with $V = \{v_0,\dots, v_{n-1}\}$ and $E = \{(v_0,v_l), (v_l, v_{n-1}): l = 1,\dots, n-2\}$ where agent $v_0$ and $v_{n-1}$ are not connected but share $n-2$ common friends. We can show that the correlation between ${S}_0$ and ${S}_{n-1}$ converge to $1$ as the number of common friends $d$ increases, while the correlation between ${S}_0$ and ${S}_1$ is bounded away from $1$.
  • Figure 4: ECDF comparisons on all users without any selection.
  • Figure 5: In each of the rows, we present the ECDF comparisons after changing the selection criteria for the user group as follows: from female to male, from ages 30–49 to ages 5–29, from ages 30–49 to ages 50+, respectively.
  • ...and 3 more figures

Theorems & Definitions (40)

  • Definition 2.1: vail1953stochasticdavidson1959experimental
  • Example 2.2: Bradley-Terry-Luce, Thurstone model, and more tversky1969substitutability
  • Proposition 2.3
  • Example 2.4: Mallows $\eta$-model 1379e110-3390-3287-8a9b-6ccafc65156a
  • Proposition 2.5
  • Definition 2.6
  • Theorem 1
  • Remark 3.1
  • Definition 4.1
  • Lemma 4.2
  • ...and 30 more