Table of Contents
Fetching ...

Human-AI Collaboration: Trade-offs Between Performance and Preferences

Lukas William Mayer, Sheer Karny, Jackie Ayoub, Miao Song, Danyang Tian, Ehsan Moradi-Pari, Mark Steyvers

TL;DR

This work investigates how humans choose among collaborative AIs and whether higher objective performance aligns with human preferences. It uses a dynamic target interception task with five rule-based agents that differ in how they respond to human actions, analyzed via Bayesian models to link strategies, performance, and preferences. The findings show people favor human-centered, inequity-averse agents that allow meaningful contribution, even when such agents do not maximize raw performance, suggesting a valuable complementarity between subjective impressions and objective metrics. The study highlights that designing collaborative AI with human preferences in mind can maintain performance while improving likability, with implications for deploying human-AI teams in real-world tasks.

Abstract

Despite the growing interest in collaborative AI, designing systems that seamlessly integrate human input remains a major challenge. In this study, we developed a task to systematically examine human preferences for collaborative agents. We created and evaluated five collaborative AI agents with strategies that differ in the manner and degree they adapt to human actions. Participants interacted with a subset of these agents, evaluated their perceived traits, and selected their preferred agent. We used a Bayesian model to understand how agents' strategies influence the Human-AI team performance, AI's perceived traits, and the factors shaping human-preferences in pairwise agent comparisons. Our results show that agents who are more considerate of human actions are preferred over purely performance-maximizing agents. Moreover, we show that such human-centric design can improve the likability of AI collaborators without reducing performance. We find evidence for inequality-aversion effects being a driver of human choices, suggesting that people prefer collaborative agents which allow them to meaningfully contribute to the team. Taken together, these findings demonstrate how collaboration with AI can benefit from development efforts which include both subjective and objective metrics.

Human-AI Collaboration: Trade-offs Between Performance and Preferences

TL;DR

This work investigates how humans choose among collaborative AIs and whether higher objective performance aligns with human preferences. It uses a dynamic target interception task with five rule-based agents that differ in how they respond to human actions, analyzed via Bayesian models to link strategies, performance, and preferences. The findings show people favor human-centered, inequity-averse agents that allow meaningful contribution, even when such agents do not maximize raw performance, suggesting a valuable complementarity between subjective impressions and objective metrics. The study highlights that designing collaborative AI with human preferences in mind can maintain performance while improving likability, with implications for deploying human-AI teams in real-world tasks.

Abstract

Despite the growing interest in collaborative AI, designing systems that seamlessly integrate human input remains a major challenge. In this study, we developed a task to systematically examine human preferences for collaborative agents. We created and evaluated five collaborative AI agents with strategies that differ in the manner and degree they adapt to human actions. Participants interacted with a subset of these agents, evaluated their perceived traits, and selected their preferred agent. We used a Bayesian model to understand how agents' strategies influence the Human-AI team performance, AI's perceived traits, and the factors shaping human-preferences in pairwise agent comparisons. Our results show that agents who are more considerate of human actions are preferred over purely performance-maximizing agents. Moreover, we show that such human-centric design can improve the likability of AI collaborators without reducing performance. We find evidence for inequality-aversion effects being a driver of human choices, suggesting that people prefer collaborative agents which allow them to meaningfully contribute to the team. Taken together, these findings demonstrate how collaboration with AI can benefit from development efforts which include both subjective and objective metrics.

Paper Structure

This paper contains 20 sections, 2 equations, 18 figures, 3 tables.

Figures (18)

  • Figure 1: Illustration of the collaborative target interception task with a human player and an AI agent. The game is played in a circular environment where the participant (red avatar) and the AI agent (green avatar) have to collect points by intercepting moving targets (circles) that appear in the game area. New targets appear in the game area, move along a straight path, and then disappear again once they reach the game's edge. Players can click on a target to direct their avatar to the optimal interception point. Arrows are used to illustrate the path and speed of motion of targets and players, but they do not appear in the game environment. The cross-hairs on the targets indicate which target each agent is pursuing. Targets have different point values, as indicated by the orange fill. The game displays score metrics for both individual players and the team (right). Participants interact with various AI agents represented by color names.
  • Figure 2: Overview of AI collaborator behavior differences. a. The Ignorant agent always pursues the highest value target, no matter what the human does. b. The Omit agent "omits" targets that the human is intended or predicted to intercept from consideration and is equivalent to Ignorant otherwise. c. The Divide agent extends the logic of Omit by also only considering targets on its half of the game environment. d. The Delay agent approximates the reaction time the human is demonstrating and is otherwise equivalent to Omit. e. The Bottom-Feeder "inverts" the value function of Omit, so it always pursues the lowest value target. Note that grayed out targets are not visible to the search algorithm.
  • Figure 3: Illustration of the procedure in Experiment 1. Top and bottom rows (1 and 2) illustrate the two blocks in the experiment. Within each block, participants play two rounds, one with each AI agent from their assigned pair. Agents are then evaluated on a variety of dimensions using 7-point Likert scales. After submitting their ratings, participants indicate their preferred agent in a two-alternative forced choice. Finally, participants are asked to provide free-text responses explaining why they chose the agent they preferred. This procedure is repeated over two blocks where target density is varied. In the illustration, the first and second blocks have low and high target densities respectively. The density order is counterbalanced in the experiment.
  • Figure 4: Performance of the human and AI player by AI agent type (columns) and target density (rows) in Experiment 1. Performance is assessed by a relative score: the total points scored relative to the total points that were available during game play. Gray areas visualize the distribution of proportional scores; error bars show the standard error of the mean.
  • Figure 5: Mean team score by AI agent type and target density in Experiment 1. The team score is based on the sum of score of the AI agent and the human playing with that AI agent relative to the total value of points that was available during game play. Gray shading indicates the distribution of values, while error bars show the standard error from the mean.
  • ...and 13 more figures