Table of Contents
Fetching ...

Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist

Niclas Boehmer, Piotr Faliszewski, Sonja Kraiczy

TL;DR

The paper tackles how the Mallows distribution over rankings behaves as the number of alternatives grows, contrasting the classic dispersion parameter $\phi$ with the normalized variant $\mathrm{norm\text{-} }\phi$. It develops a rigorous asymptotic framework to compare properties under both parameterizations, derives exact and asymptotic expressions for top-choice position, pairwise comparisons, and winner probabilities, and provides theoretical results plus empirical evidence using real-world data. The key finding is that the classic Mallows model often exhibits structure that drifts with $m$, while the normalized variant maintains stable, data-aligned properties; this motivates preferring the normalized approach in experiments involving varying numbers of alternatives. The paper offers practical warnings for experiment design, parameter estimation, and generalization across different $m$, and provides publicly available code for replication.

Abstract

The Mallows model is a popular distribution for ranked data. We empirically and theoretically analyze how the properties of rankings sampled from the Mallows model change when increasing the number of alternatives. We find that real-world data behaves differently than the Mallows model, yet is in line with its recent variant proposed by Boehmer et al. [2021]. As part of our study, we issue several warnings about using the model.

Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist

TL;DR

The paper tackles how the Mallows distribution over rankings behaves as the number of alternatives grows, contrasting the classic dispersion parameter with the normalized variant . It develops a rigorous asymptotic framework to compare properties under both parameterizations, derives exact and asymptotic expressions for top-choice position, pairwise comparisons, and winner probabilities, and provides theoretical results plus empirical evidence using real-world data. The key finding is that the classic Mallows model often exhibits structure that drifts with , while the normalized variant maintains stable, data-aligned properties; this motivates preferring the normalized approach in experiments involving varying numbers of alternatives. The paper offers practical warnings for experiment design, parameter estimation, and generalization across different , and provides publicly available code for replication.

Abstract

The Mallows model is a popular distribution for ranked data. We empirically and theoretically analyze how the properties of rankings sampled from the Mallows model change when increasing the number of alternatives. We find that real-world data behaves differently than the Mallows model, yet is in line with its recent variant proposed by Boehmer et al. [2021]. As part of our study, we issue several warnings about using the model.
Paper Structure (31 sections, 23 theorems, 41 equations, 9 figures, 1 table)

This paper contains 31 sections, 23 theorems, 41 equations, 9 figures, 1 table.

Key Result

Corollary 2.3

For fixed $\phi<1, \lim_{m\to \infty} g^{\mathrm{swap}}_m(\phi)=0$.

Figures (9)

  • Figure 1: Expected normalized swap distance of a sampled ranking from the central one (solid lines for the classic model and dashed ones for the normalized variant).
  • Figure 2: Average Plurality score of Plurality winner in $100$ rankings with a varying number of alternatives. We compare sampling profiles with a varying number of $m$ (dashed) with sampling profiles for $m=200$ alternatives and subsequently deleting some alternatives uniformly at random (solid).
  • Figure 3: Influence of the number of alternatives $m$ on different properties of rankings (ranking profiles) sampled from the Mallows model for fixed values of the classical dispersion parameter $\phi$ (solid) and the normalized dispersion parameter ${{\mathrm{norm}\hbox{-}\phi}}$ (dashed). For $\phi={{\mathrm{norm}\hbox{-}\phi}}=0$ and $\phi={{\mathrm{norm}\hbox{-}\phi}}=1$ the respective lines overlap.
  • Figure 4: Plots showing the normalized positionwise distance of a profile from ID depending on the number of alternatives in the profile. Each point corresponds to one profile. For Tour de France, the color of a point corresponds to the year of the respective edition.
  • Figure 5: Average positionwise distance from ID of profiles with $n=100$ rankings (see \ref{['sub:posDisID']}). Lightgrey points are from \ref{['fig:spooo']}.
  • ...and 4 more figures

Theorems & Definitions (38)

  • Corollary 2.3
  • Theorem 3.1
  • Definition 3.2
  • Proposition 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Proposition 3.4
  • Theorem 3.5
  • Proposition 3.5
  • Theorem 3.6
  • ...and 28 more