Active Preference Learning for Ordering Items In- and Out-of-sample

Herman Bergström; Emil Carlsson; Devdatt Dubhashi; Fredrik D. Johansson

Active Preference Learning for Ordering Items In- and Out-of-sample

Herman Bergström, Emil Carlsson, Devdatt Dubhashi, Fredrik D. Johansson

TL;DR

This work gives an upper bound on the expected ordering error of a logistic preference model as a function of which items have been compared, and proposes an active learning strategy that samples items to minimize this bound by accounting for aleatoric and epistemic uncertainty in comparisons.

Abstract

Learning an ordering of items based on pairwise comparisons is useful when items are difficult to rate consistently on an absolute scale, for example, when annotators have to make subjective assessments. When exhaustive comparison is infeasible, actively sampling item pairs can reduce the number of annotations necessary for learning an accurate ordering. However, many algorithms ignore shared structure between items, limiting their sample efficiency and precluding generalization to new items. It is also common to disregard how noise in comparisons varies between item pairs, despite it being informative of item similarity. In this work, we study active preference learning for ordering items with contextual attributes, both in- and out-of-sample. We give an upper bound on the expected ordering error of a logistic preference model as a function of which items have been compared. Next, we propose an active learning strategy that samples items to minimize this bound by accounting for aleatoric and epistemic uncertainty in comparisons. We evaluate the resulting algorithm, and a variant aimed at reducing model misspecification, in multiple realistic ordering tasks with comparisons made by human annotators. Our results demonstrate superior sample efficiency and generalization compared to non-contextual ranking approaches and active preference learning baselines.

Active Preference Learning for Ordering Items In- and Out-of-sample

TL;DR

Abstract

Paper Structure (37 sections, 3 theorems, 79 equations, 13 figures, 2 tables, 3 algorithms)

This paper contains 37 sections, 3 theorems, 79 equations, 13 figures, 2 tables, 3 algorithms.

Introduction
Contributions.
Ordering items with active preference learning
Related work
Active Preference Learning:
Bandits:
RLHF:
Which comparisons result in a good ordering?
Greedy uncertainty reduction for ordering (GURO)
Computational Complexity:
Preference models for in- and out-of-sample ordering
Experiments
Ordering X-ray images under the logistic model
Ordering items with human preference data
Conclusion
...and 22 more sections

Key Result

Lemma 1

Define, for all pairs of items $i,j \in \mathcal{I}$, and any $\Delta>0$, Then, if $\alpha \coloneqq \alpha_{ij}(\Delta), \beta\coloneqq \beta_{ij}(\Delta)$ and $\alpha, \beta \leq \frac{1}{4dT}$, $C_1$ depends on $S, \lambda_0, Q$ from Assumptions ass:bd_theta--ass:rank (see Appendix app:theory for definition and proof).

Figures (13)

Figure 1: X-RayAge. Performance of active sampling strategies when comparisons are simulated using a logistic model according to \ref{['eq:comparison_lr']}. In-sample Kendall's Tau distance $R_{I_D}$ on 200 images (left) and generalization error $R_{I_E} - R_{I_D}$ for models trained on 150 images and evaluated on 150 images from a different distribution (right). All results are averaged over $100$ different random seeds.
Figure 2: ImageClarity.$n = 100$, $d = 63$.
Figure 3: WiscAds.$n = 935$, $d = 162$.
Figure 4: IMDB-WIKI-SbS.$n = 6\ 072$, $d = 75$.
Figure 5: IMDB-WIKI-SbS.$n = 3\ 000$$(6\ 072)$.
...and 8 more figures

Theorems & Definitions (6)

Lemma 1: Concentration Lemma
Theorem 1: Upper bound on the ordering error
proof
proof
Proposition 1: Informal
proof

Active Preference Learning for Ordering Items In- and Out-of-sample

TL;DR

Abstract

Active Preference Learning for Ordering Items In- and Out-of-sample

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (6)