Human-AI Collaboration with Misaligned Preferences

Jiaxin Song; Parnian Shahkar; Kate Donahue; Bhaskar Ray Chaudhury

Human-AI Collaboration with Misaligned Preferences

Jiaxin Song, Parnian Shahkar, Kate Donahue, Bhaskar Ray Chaudhury

TL;DR

This work models human-AI collaboration as a curator algorithm selecting a top-$k$ subset from $m$ items, with humans who have noisy and potentially misaligned preferences. It analyzes when misalignment with the algorithm can improve individual and population welfare using Mallows and Plackett-Luce permutation models, and provides both theoretical results and empirical validation. A key finding is that misalignment can be beneficial for some users and settings, while utilitarian welfare maximization is generally NP-hard but tractable via a mixed-integer program, with uplift capable under certain conditions and small amounts of noise. The study highlights a practical tension between maximizing overall welfare and ensuring uplift for all user types, and discusses design and policy implications including when to allow, or even encourage, stochasticity and misalignment in algorithmic tools. These insights inform how to design algorithmic curation tools and policies that balance personalization with fairness across heterogeneous user populations.

Abstract

In many real-life settings, algorithms play the role of assistants, while humans ultimately make the final decision. Often, algorithms specifically act as curators, narrowing down a wide range of options into a smaller subset that the human picks between: consider content recommendation or chatbot responses to questions with multiple valid answers. Crucially, humans may not know their own preferences perfectly either, but instead may only have access to a noisy sampling over preferences. Algorithms can assist humans by curating a smaller subset of items, but must also face the challenge of misalignment: humans may have different preferences from each other (and from the algorithm), and the algorithm may not know the exact preferences of the human they are facing at any point in time. In this paper, we model and theoretically study such a setting. Specifically, we show instances where humans benefit by collaborating with a misaligned algorithm. Surprisingly, we show that humans gain more utility from a misaligned algorithm (which makes different mistakes) than from an aligned algorithm. Next, we build on this result by studying what properties of algorithms maximize human welfare when the goals could be either utilitarian welfare or ensuring all humans benefit. We conclude by discussing implications for designers of algorithmic tools and policymakers.

Human-AI Collaboration with Misaligned Preferences

TL;DR

This work models human-AI collaboration as a curator algorithm selecting a top-

subset from

items, with humans who have noisy and potentially misaligned preferences. It analyzes when misalignment with the algorithm can improve individual and population welfare using Mallows and Plackett-Luce permutation models, and provides both theoretical results and empirical validation. A key finding is that misalignment can be beneficial for some users and settings, while utilitarian welfare maximization is generally NP-hard but tractable via a mixed-integer program, with uplift capable under certain conditions and small amounts of noise. The study highlights a practical tension between maximizing overall welfare and ensuring uplift for all user types, and discusses design and policy implications including when to allow, or even encourage, stochasticity and misalignment in algorithmic tools. These insights inform how to design algorithmic curation tools and policies that balance personalization with fairness across heterogeneous user populations.

Human-AI Collaboration with Misaligned Preferences

TL;DR

Abstract

Human-AI Collaboration with Misaligned Preferences

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (62)