When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

Kate Donahue; Sreenivas Gollapudi; Kostas Kollias

When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

Kate Donahue, Sreenivas Gollapudi, Kostas Kollias

TL;DR

This work analyzes a two-agent ranking framework where an algorithm reduces $n$ items to a top-$k$ set, and a human selects the final item, aiming to maximize the probability of choosing the true best item $x_1$. Using Mallows and Random Utility Models, it characterizes when the joint human-algorithm system outperforms either actor alone (complementarity), showing that for unanchored settings and certain $k$ (notably $k=2$ with equal accuracies), complementarity holds; with unequal accuracies, the human’s accuracy often has a larger impact. Anchoring the human on the algorithm’s ordering ($w_a>0$) generally destroys complementarity, with a complete anchor ($w_a=1$) making collaboration strictly worse than the algorithm alone; partial anchoring can still permit gain under small $k$. The paper also demonstrates that the observed complementarity phenomena extend to the Random Utility Model, suggesting robustness across permutation-generating processes. Overall, the results inform when collaborative filtering and human-in-the-loop decisions yield tangible improvements and when to avoid collaboration due to anchoring effects.

Abstract

Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among those $k$. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of $k$ maximizes the probability that the best item will be ultimately selected? For $k=1$, performance is optimized by the algorithm acting alone, and for $k=n$ it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.

When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

TL;DR

This work analyzes a two-agent ranking framework where an algorithm reduces

items to a top-

set, and a human selects the final item, aiming to maximize the probability of choosing the true best item

. Using Mallows and Random Utility Models, it characterizes when the joint human-algorithm system outperforms either actor alone (complementarity), showing that for unanchored settings and certain

(notably

with equal accuracies), complementarity holds; with unequal accuracies, the human’s accuracy often has a larger impact. Anchoring the human on the algorithm’s ordering (

) generally destroys complementarity, with a complete anchor (

) making collaboration strictly worse than the algorithm alone; partial anchoring can still permit gain under small

. The paper also demonstrates that the observed complementarity phenomena extend to the Random Utility Model, suggesting robustness across permutation-generating processes. Overall, the results inform when collaborative filtering and human-in-the-loop decisions yield tangible improvements and when to avoid collaboration due to anchoring effects.

Abstract

items, and presents a subset of size

to the human, who selects a final item from among those

. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of

maximizes the probability that the best item will be ultimately selected? For

, performance is optimized by the algorithm acting alone, and for

it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set

- that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.

Paper Structure (24 sections, 14 theorems, 32 equations, 4 figures)

This paper contains 24 sections, 14 theorems, 32 equations, 4 figures.

Introduction
Related work
Models and notation
Human-algorithm collaboration model
Assumptions
Noise models
Mallows model
Random Utility model
Preliminary tools: mapping between good and bad events
Complementarity without anchoring
Equal accuracy in human and algorithm
Unequal accuracy
Impact of anchoring
Random Utility Model
Unanchored (extension of Section \ref{['sec:noanch']})
...and 9 more sections

Key Result

Lemma 1

For any human algorithm system with $k < n$, there is a bijective mapping between "good events" and "bad events".

Figures (4)

Figure 1: A plot showing relative accuracy of the joint system for differing algorithm and human accuracy $\phi_a, \phi_h$, given a Mallows model distribution for each actor with $n=3, k=2$. The contour plot gives the accuracy of the joint human-algorithm system, which is strictly increasing in $\phi^a, \phi^h$. Overlaid in blue is the analytically derived region of complementarity. The regions derived in Lemmas \ref{['lem:humregion']} and \ref{['lem:algregion']} are overlaid in red and white, respectively. Note that this plot is symbolic and thus not based on simulations.
Figure 2: $n = 5$ total items, Mallows model, equal accuracy rates. Displaying the impact of (partial) anchoring. The $x$ axis gives the number of items presented ($k$): note that for $k=1$ this is equivalent to the algorithm picking, while for $k=n=5$ this is equivalent to the human picking alone. Weight measures how strongly the human anchors on the algorithm, with $0$ representing independence and $1$ representing the strongest anchoring (as in Theorem \ref{['thrm:anchbad']}, where complementarity is impossible). Each point represents average of 10 trials each with $5\cdot 10^4$ simulations each (error bars omitted, on the order of $0.01$).
Figure 3: A version of Figure \ref{['fig:diff_acc_symbolic']}, but given a RUM with Normal distribution for each actor with $n=10, k=2$. Similar to Figure \ref{['fig:diff_acc_symbolic']}, the $x$ and $y$ axis show increasing accuracy (here, decreasing variance). For clarity, we have flipped the axes to match Figure \ref{['fig:diff_acc_symbolic']} so the lower left and upper right mean high noise and perfect accuracy, respectively. The yellow region is where complementarity occurs, while the purple region is where complementarity fails to occur, and the red line gives the $x=y$ axis of symmetry.
Figure 4: $n = 5$ total items, Random Utility model, equal accuracy rates. Displaying the impact of (partial) anchoring. Version of Figure \ref{['fig:mallows_anch']} with the Random Utility model. The $x$ axis gives the number of items presented ($k$): note that for $k=1$ this is equivalent to the algorithm picking, while for $k=n=5$ this is equivalent to the human picking alone. Weight measures how strongly the human anchors on the algorithm, with $0$ representing independence and $1$ representing the strongest anchoring. Each point represents average of 10 trials each with $5*10^4$ simulations each (error bars omitted, on the order of $0.01$.

Theorems & Definitions (26)

Definition 1
Definition 2
Lemma 1
Corollary
Definition 3: Best-item mapping
Example 1
Theorem 1
proof : Proof sketch
Lemma 2: More accurate human
Lemma 3: More accurate algorithm
...and 16 more

When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

TL;DR

Abstract

When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (26)