Table of Contents
Fetching ...

Scorio.jl: A Julia package for ranking stochastic responses

Mohsen Hariri, Michael Hinczewski, Vipin Chaudhary

Abstract

Scorio.jl is a Julia package for evaluating and ranking systems from repeated responses to shared tasks. It provides a common tensor-based interface for direct score-based, pairwise, psychometric, voting, graph, and listwise methods, so the same benchmark can be analyzed under multiple ranking assumptions. We describe the package design, position it relative to existing Julia tools, and report pilot experiments on synthetic rank recovery, stability under limited trials, and runtime scaling.

Scorio.jl: A Julia package for ranking stochastic responses

Abstract

Scorio.jl is a Julia package for evaluating and ranking systems from repeated responses to shared tasks. It provides a common tensor-based interface for direct score-based, pairwise, psychometric, voting, graph, and listwise methods, so the same benchmark can be analyzed under multiple ranking assumptions. We describe the package design, position it relative to existing Julia tools, and report pilot experiments on synthetic rank recovery, stability under limited trials, and runtime scaling.
Paper Structure (14 sections, 3 figures, 3 tables)

This paper contains 14 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Left: Kendall $\tau_b$ between predicted and ground-truth rankings as the number of trials grows. Most methods cluster near $1.0$ quickly; Elo does not. Right: stability of limited-trial rankings relative to a bayes reference at $N_{\max}=64$. Method differences are most visible in the small-$n$ regime.
  • Figure 2: Complementary views of the same experiments. Left: mean absolute rank error versus trials. Right: top-1 agreement under limited trials. mG-Pass@k degenerates to all ties at $n=1$; Elo remains unreliable throughout.
  • Figure 3: Runtime versus the number of tasks $M$ for several $(L,N)$ configurations. Score-based and graph methods stay fast across the grid. Rasch and Kemeny--Young are substantially more expensive.