Statistical Models of Top-$k$ Partial Orders

Amel Awadelkarim; Johan Ugander

Statistical Models of Top-$k$ Partial Orders

Amel Awadelkarim, Johan Ugander

TL;DR

This work introduces and taxonomize approaches for jointly modeling distributions over top-k partial orders and list lengths k, considering two classes of approaches: composite models that view a partial order as a truncation of a total order, and augmented ranking models that model the construction of the list as a sequence of choice decisions, including the decision to stop.

Abstract

In many contexts involving ranked preferences, agents submit partial orders over available alternatives. Statistical models often treat these as marginal in the space of total orders, but this approach overlooks information contained in the list length itself. In this work, we introduce and taxonomize approaches for jointly modeling distributions over top-$k$ partial orders and list lengths $k$, considering two classes of approaches: composite models that view a partial order as a truncation of a total order, and augmented ranking models that model the construction of the list as a sequence of choice decisions, including the decision to stop. For composite models, we consider three dependency structures for joint modeling of order and truncation length. For augmented ranking models, we consider different assumptions on how the stop-token choice is modeled. Using data consisting of partial rankings from San Francisco school choice and San Francisco ranked choice elections, we evaluate how well the models predict observed data and generate realistic synthetic datasets. We find that composite models, explicitly modeling length as a categorical variable, produce synthetic datasets with accurate length distributions, and an augmented model with position-dependent item utilities jointly models length and preferences in the training data best, as measured by negative log loss. Methods from this work have significant implications on the simulation and evaluation of real-world social systems that solicit ranked preferences.

Statistical Models of Top-$k$ Partial Orders

TL;DR

Abstract

partial orders and list lengths

, considering two classes of approaches: composite models that view a partial order as a truncation of a total order, and augmented ranking models that model the construction of the list as a sequence of choice decisions, including the decision to stop. For composite models, we consider three dependency structures for joint modeling of order and truncation length. For augmented ranking models, we consider different assumptions on how the stop-token choice is modeled. Using data consisting of partial rankings from San Francisco school choice and San Francisco ranked choice elections, we evaluate how well the models predict observed data and generate realistic synthetic datasets. We find that composite models, explicitly modeling length as a categorical variable, produce synthetic datasets with accurate length distributions, and an augmented model with position-dependent item utilities jointly models length and preferences in the training data best, as measured by negative log loss. Methods from this work have significant implications on the simulation and evaluation of real-world social systems that solicit ranked preferences.

Paper Structure (17 sections, 28 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 28 equations, 5 figures, 2 tables, 2 algorithms.

Introduction
Related work
Preliminaries
Rankings
Statistical models
Composite models
Model specification
Ranking model
Length model
The augmented model
Model selection
Experiments
Setup
Goodness of fit
Synthetic datasets
...and 2 more sections

Figures (5)

Figure 1: Three dependence structures for composite models. $k\in[m]$ is a random variable representing length, $R\in\mathcal{L}(\mathcal{A})$ is a random variable representing a total order, $X\in\mathbb{R}^d$ are covariates, and $Q\in\Omega(\mathcal{A})$ is a top-$k$ partial order. Observed quantities are shaded in grey.
Figure 2: NLL loss (lower is better) of our test datasets under the six models in Table \ref{['tab:models']}. C-I (lightest blue) and A (yellow) are the baselines.
Figure 3: Sampled length distributions statistics on a representative RCV (left) and SC (right) dataset. Statistics of their true distributions in grey. C-CI was not evaluated on the RCV datasets as no voter covariates were available with the data.
Figure 4: Synthetic demand over 2018 SF Mayoral candidates (top row) and 2018-19 SFUSD program types (bottom row). True demand in grey. Left plots show proportion of choices in first position, right plots show proportion of choices overall.
Figure 5: Assignment outcomes using synthetic school choice datasets sampled from our 6 models compared with true outcomes in grey. Proportion of students who were assigned to their top-1, a top-3, or any one of their listed alternatives (as opposed to non-assignment).

Statistical Models of Top-$k$ Partial Orders

TL;DR

Abstract

Statistical Models of Top-$k$ Partial Orders

Authors

TL;DR

Abstract

Table of Contents

Figures (5)