PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

Shailendra Bhandari

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

Shailendra Bhandari

TL;DR

This work has extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy.

Abstract

Determining a winner among a set of items using active pairwise comparisons under a limited budget is a challenging problem in preference-based learning. The goal of this study is to implement and evaluate the PARWiS algorithm, which shows spectral ranking and disruptive pair selection to identify the best item under shoestring budgets. This work have extended the PARWiS with a contextual variant (Contextual PARWiS) and a reinforcement learning-based variant (RL PARWiS), comparing them against baselines, including Double Thompson Sampling and a random selection strategy. This evaluation spans synthetic and real-world datasets (Jester and MovieLens), using budgets of 40, 60, and 80 comparisons for 20 items. The performance is measured through recovery fraction, true rank of reported winner, reported rank of true winner, and cumulative regret, alongside the separation metric $Δ_{1,2}$. Results show that PARWiS and RL PARWiS outperform baselines across all datasets, particularly in the Jester dataset with a higher $Δ_{1,2}$, while performance gaps narrow in the more challenging MovieLens dataset with a smaller $Δ_{1,2}$. Contextual PARWiS shows comparable performance to PARWiS, indicating that contextual features may require further tuning to provide significant benefits.

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

TL;DR

Abstract

. Results show that PARWiS and RL PARWiS outperform baselines across all datasets, particularly in the Jester dataset with a higher

, while performance gaps narrow in the more challenging MovieLens dataset with a smaller

. Contextual PARWiS shows comparable performance to PARWiS, indicating that contextual features may require further tuning to provide significant benefits.

Paper Structure (22 sections, 1 equation, 13 figures, 7 tables)

This paper contains 22 sections, 1 equation, 13 figures, 7 tables.

Introduction
Related work
Dueling bandits and regret minimization
Active learning for ranking and winner determination
Contextual dueling bandits
Spectral ranking and preference aggregation
Real-world datasets for preference learning
Challenges in shoestring budgets
Methodology
Problem setting
Algorithms
Datasets
Evaluation metrics
Experiments
Setup
...and 7 more sections

Figures (13)

Figure 1: Dataset Visualizations: (a) Heatmap of the preference matrix for the Synthetic dataset ($k=20$), showing pairwise probabilities $P_{i,j}$. (b) Histogram of ratings for the 20 selected jokes in Jester, ranging from -10 to 10. (c) Histogram of ratings for the 20 selected movies in MovieLens, ranging from 0.5 to 5. (d) Boxplot of $\Delta_{1,2}$ for the Synthetic dataset across 30 runs, showing the distribution of problem difficulty.
Figure 2: Performance on Synthetic Dataset at $B=40$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner. Plots for $B=60, 80$ are in Appendix \ref{['app:figures']}, Figure \ref{['fig:synthetic_plots_60']} and \ref{['fig:synthetic_plots_80']}.
Figure 3: Performance on Jester Dataset at $B=40$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner. Plots for $B=60, 80$ are in Appendix \ref{['app:figures']} Figure \ref{['fig:jester_plots_60']} and \ref{['fig:jester_plots_80']}.
Figure 4: Performance on MovieLens dataset at $B=40$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner. Plots for $B=60, 80$ are in Appendix \ref{['app:figures']} Figure \ref{['fig:movielens_plots_60']} and \ref{['fig:movielens_plots_80']}.
Figure 5: Performance on Synthetic Dataset at $B=60$. From left to right: Cumulative Regret, Recovery Fraction, True Rank of Reported Winner, Reported Rank of True Winner.
...and 8 more figures

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

TL;DR

Abstract

PARWiS: Winner determination under shoestring budgets using active pairwise comparisons

Authors

TL;DR

Abstract

Table of Contents

Figures (13)