Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning
Zana Buçinca, Siddharth Swaroop, Amanda E. Paluch, Susan A. Murphy, Krzysztof Z. Gajos
TL;DR
This paper addresses how to optimize human-centric objectives in AI-assisted decision-making beyond mere accuracy. It introduces offline reinforcement learning to learn adaptive policies that tailor AI support to context and individual differences, notably Need for Cognition, aiming to maximize immediate accuracy and long-term learning. Across two experiments, accuracy-optimized policies consistently improve human accuracy and can achieve human-AI complementarity, while learning-optimized policies yield more nuanced gains and reveal that optimizing for learning is harder. The work demonstrates offline RL's potential to reveal insights about human-AI decision spaces and emphasizes designing AI assistance that supports learning and other human-centered outcomes alongside accuracy.
Abstract
Imagine if AI decision-support tools not only complemented our ability to make accurate decisions, but also improved our skills, boosted collaboration, and elevated the joy we derive from our tasks. Despite the potential to optimize a broad spectrum of such human-centric objectives, the design of current AI tools remains focused on decision accuracy alone. We propose offline reinforcement learning (RL) as a general approach for modeling human-AI decision-making to optimize human-AI interaction for diverse objectives. RL can optimize such objectives by tailoring decision support, providing the right type of assistance to the right person at the right time. We instantiated our approach with two objectives: human-AI accuracy on the decision-making task and human learning about the task and learned decision support policies from previous human-AI interaction data. We compared the optimized policies against several baselines in AI-assisted decision-making. Across two experiments (N=316 and N=964), our results demonstrated that people interacting with policies optimized for accuracy achieve significantly better accuracy -- and even human-AI complementarity -- compared to those interacting with any other type of AI support. Our results further indicated that human learning was more difficult to optimize than accuracy, with participants who interacted with learning-optimized policies showing significant learning improvement only at times. Our research (1) demonstrates offline RL to be a promising approach to model human-AI decision-making, leading to policies that may optimize human-centric objectives and provide novel insights about the AI-assisted decision-making space, and (2) emphasizes the importance of considering human-centric objectives beyond decision accuracy in AI-assisted decision-making, opening up the novel research challenge of optimizing human-AI interaction for such objectives.
