Table of Contents
Fetching ...

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

Shailendra Bhandari

TL;DR

This work has extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy.

Abstract

Determining a winner among a set of items using active pairwise comparisons under a limited budget is a challenging problem in preference-based learning. The goal of this study is to implement and evaluate the PARWiS algorithm, which shows spectral ranking and disruptive pair selection to identify the best item under shoestring budgets. This work have extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy. This evaluation spans synthetic and real-world datasets (Jester and MovieLens), using budgets of 40, 60, and 80 comparisons for 20 items. The performance is measured through recovery fraction, true rank of reported winner, reported rank of true winner, and cumulative regret, alongside the separation metric \(Δ_{1,2}\). Results show that PARWiS and RL PARWiS outperform baselines across all datasets, particularly in the Jester dataset with a higher \(Δ_{1,2}\), while performance gaps narrow in the more challenging MovieLens dataset with a smaller \(Δ_{1,2}\). Contextual PARWiS shows comparable performance to PARWiS, indicating that contextual features may require further tuning to provide significant benefits.

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

TL;DR

This work has extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy.

Abstract

Determining a winner among a set of items using active pairwise comparisons under a limited budget is a challenging problem in preference-based learning. The goal of this study is to implement and evaluate the PARWiS algorithm, which shows spectral ranking and disruptive pair selection to identify the best item under shoestring budgets. This work have extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy. This evaluation spans synthetic and real-world datasets (Jester and MovieLens), using budgets of 40, 60, and 80 comparisons for 20 items. The performance is measured through recovery fraction, true rank of reported winner, reported rank of true winner, and cumulative regret, alongside the separation metric . Results show that PARWiS and RL PARWiS outperform baselines across all datasets, particularly in the Jester dataset with a higher , while performance gaps narrow in the more challenging MovieLens dataset with a smaller . Contextual PARWiS shows comparable performance to PARWiS, indicating that contextual features may require further tuning to provide significant benefits.
Paper Structure (22 sections, 1 equation, 13 figures, 7 tables)

This paper contains 22 sections, 1 equation, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Dataset Visualizations: (a) Heatmap of the preference matrix for the Synthetic dataset ($k=20$), showing pairwise probabilities $P_{i,j}$. (b) Histogram of ratings for the 20 selected jokes in Jester, ranging from -10 to 10. (c) Histogram of ratings for the 20 selected movies in MovieLens, ranging from 0.5 to 5. (d) Boxplot of $\Delta_{1,2}$ for the Synthetic dataset across 30 runs, showing the distribution of problem difficulty.
  • Figure 2: Performance on Synthetic Dataset at $B=40$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner. Plots for $B=60, 80$ are in Appendix \ref{['app:figures']}, Figure \ref{['fig:synthetic_plots_60']} and \ref{['fig:synthetic_plots_80']}.
  • Figure 3: Performance on Jester Dataset at $B=40$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner. Plots for $B=60, 80$ are in Appendix \ref{['app:figures']} Figure \ref{['fig:jester_plots_60']} and \ref{['fig:jester_plots_80']}.
  • Figure 4: Performance on MovieLens dataset at $B=40$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner. Plots for $B=60, 80$ are in Appendix \ref{['app:figures']} Figure \ref{['fig:movielens_plots_60']} and \ref{['fig:movielens_plots_80']}.
  • Figure 5: Performance on Synthetic Dataset at $B=60$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner.
  • ...and 8 more figures