Table of Contents
Fetching ...

Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning

Zana Buçinca, Siddharth Swaroop, Amanda E. Paluch, Susan A. Murphy, Krzysztof Z. Gajos

TL;DR

This paper addresses how to optimize human-centric objectives in AI-assisted decision-making beyond mere accuracy. It introduces offline reinforcement learning to learn adaptive policies that tailor AI support to context and individual differences, notably Need for Cognition, aiming to maximize immediate accuracy and long-term learning. Across two experiments, accuracy-optimized policies consistently improve human accuracy and can achieve human-AI complementarity, while learning-optimized policies yield more nuanced gains and reveal that optimizing for learning is harder. The work demonstrates offline RL's potential to reveal insights about human-AI decision spaces and emphasizes designing AI assistance that supports learning and other human-centered outcomes alongside accuracy.

Abstract

Imagine if AI decision-support tools not only complemented our ability to make accurate decisions, but also improved our skills, boosted collaboration, and elevated the joy we derive from our tasks. Despite the potential to optimize a broad spectrum of such human-centric objectives, the design of current AI tools remains focused on decision accuracy alone. We propose offline reinforcement learning (RL) as a general approach for modeling human-AI decision-making to optimize human-AI interaction for diverse objectives. RL can optimize such objectives by tailoring decision support, providing the right type of assistance to the right person at the right time. We instantiated our approach with two objectives: human-AI accuracy on the decision-making task and human learning about the task and learned decision support policies from previous human-AI interaction data. We compared the optimized policies against several baselines in AI-assisted decision-making. Across two experiments (N=316 and N=964), our results demonstrated that people interacting with policies optimized for accuracy achieve significantly better accuracy -- and even human-AI complementarity -- compared to those interacting with any other type of AI support. Our results further indicated that human learning was more difficult to optimize than accuracy, with participants who interacted with learning-optimized policies showing significant learning improvement only at times. Our research (1) demonstrates offline RL to be a promising approach to model human-AI decision-making, leading to policies that may optimize human-centric objectives and provide novel insights about the AI-assisted decision-making space, and (2) emphasizes the importance of considering human-centric objectives beyond decision accuracy in AI-assisted decision-making, opening up the novel research challenge of optimizing human-AI interaction for such objectives.

Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning

TL;DR

This paper addresses how to optimize human-centric objectives in AI-assisted decision-making beyond mere accuracy. It introduces offline reinforcement learning to learn adaptive policies that tailor AI support to context and individual differences, notably Need for Cognition, aiming to maximize immediate accuracy and long-term learning. Across two experiments, accuracy-optimized policies consistently improve human accuracy and can achieve human-AI complementarity, while learning-optimized policies yield more nuanced gains and reveal that optimizing for learning is harder. The work demonstrates offline RL's potential to reveal insights about human-AI decision spaces and emphasizes designing AI assistance that supports learning and other human-centered outcomes alongside accuracy.

Abstract

Imagine if AI decision-support tools not only complemented our ability to make accurate decisions, but also improved our skills, boosted collaboration, and elevated the joy we derive from our tasks. Despite the potential to optimize a broad spectrum of such human-centric objectives, the design of current AI tools remains focused on decision accuracy alone. We propose offline reinforcement learning (RL) as a general approach for modeling human-AI decision-making to optimize human-AI interaction for diverse objectives. RL can optimize such objectives by tailoring decision support, providing the right type of assistance to the right person at the right time. We instantiated our approach with two objectives: human-AI accuracy on the decision-making task and human learning about the task and learned decision support policies from previous human-AI interaction data. We compared the optimized policies against several baselines in AI-assisted decision-making. Across two experiments (N=316 and N=964), our results demonstrated that people interacting with policies optimized for accuracy achieve significantly better accuracy -- and even human-AI complementarity -- compared to those interacting with any other type of AI support. Our results further indicated that human learning was more difficult to optimize than accuracy, with participants who interacted with learning-optimized policies showing significant learning improvement only at times. Our research (1) demonstrates offline RL to be a promising approach to model human-AI decision-making, leading to policies that may optimize human-centric objectives and provide novel insights about the AI-assisted decision-making space, and (2) emphasizes the importance of considering human-centric objectives beyond decision accuracy in AI-assisted decision-making, opening up the novel research challenge of optimizing human-AI interaction for such objectives.
Paper Structure (60 sections, 2 equations, 14 figures, 3 tables)

This paper contains 60 sections, 2 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: An example of the exercise prescription decision-making task with different types of AI assistance (i.e., actions). Participants were assisted in choosing between the two sets of exercises as depicted for different conditions. In the No-AI condition (not shown) participants were not provided with any AI assistance.
  • Figure 2: An overview of the experiment flow for the data collection and evaluation studies. In the evaluation studies, participants were randomly assigned to one of the optimal policies (that matched their NFC level) or a baseline policy.
  • Figure 3: Distributions of types of AI assistance selected by the optimal policies for different objectives and NFC groups. Each bar in the figure represents the percentage of states in which an action was the top action, with the numerator being the number of states where the action was the top choice and the denominator being the total number of states in the analysis.
  • Figure 4: Randomization test results. Each facet depicts the $\chi^2$ distribution of 1000 datasets of random NFC assignments for the given analysis and the $\chi^2$ on the actual dataset (in blue). "NFC objective = accuracy", for example, shows the difference of distributions of actions between the two NFC groups for the accuracy as the objective. P-value is computed as the fraction of sampled datasets in which the dataset's $\chi^2$ exceeded the actual $\chi^2$.
  • Figure 5: Experiment 1: Marginal means of participants interacting with the three policies: accuracy, learning, SXAI, on the two objectives: immediate accuracy and learning. Error bars indicate one standard error. The dashed line in (a) indicates the performance of the AI. Significance levels (if any) are depicted with letters. Conditions not connected by the same letter are significantly different.
  • ...and 9 more figures