Table of Contents
Fetching ...

From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis

Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, Ziang Xiao, Ming Yin

TL;DR

This work investigates how to enhance AI-assisted decision making when AI explanations are unavailable by leveraging LLM-powered analyses of task features. An initial randomized study shows that presenting per-feature analyses sequentially or all at once does not improve performance, motivating an adaptive, data-driven framework that models how analyses influence human decisions and selects analyses to maximize appropriate reliance. The framework yields significant gains in decision accuracy and reductions in overreliance across income prediction and recidivism tasks, while also reducing interaction burden. These findings highlight the potential and risks of algorithmically nudging human decisions with LLM-generated analyses, offering design guidance for future human-AI collaboration in decision making.

Abstract

AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not %understand reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities to enhance AI-assisted decision making in the absence of AI explanations by providing natural-language-based analysis of AI's decision recommendation, e.g., how each feature of a decision making task might contribute to the AI recommendation. In this paper, via a randomized experiment, we first show that presenting LLM-powered analysis of each task feature, either sequentially or concurrently, does not significantly improve people's AI-assisted decision performance. To enable decision makers to better leverage LLM-powered analysis, we then propose an algorithmic framework to characterize the effects of LLM-powered analysis on human decisions and dynamically decide which analysis to present. Our evaluation with human subjects shows that this approach effectively improves decision makers' appropriate reliance on AI in AI-assisted decision making.

From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis

TL;DR

This work investigates how to enhance AI-assisted decision making when AI explanations are unavailable by leveraging LLM-powered analyses of task features. An initial randomized study shows that presenting per-feature analyses sequentially or all at once does not improve performance, motivating an adaptive, data-driven framework that models how analyses influence human decisions and selects analyses to maximize appropriate reliance. The framework yields significant gains in decision accuracy and reductions in overreliance across income prediction and recidivism tasks, while also reducing interaction burden. These findings highlight the potential and risks of algorithmically nudging human decisions with LLM-generated analyses, offering design guidance for future human-AI collaboration in decision making.

Abstract

AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not %understand reflect on AI's decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities to enhance AI-assisted decision making in the absence of AI explanations by providing natural-language-based analysis of AI's decision recommendation, e.g., how each feature of a decision making task might contribute to the AI recommendation. In this paper, via a randomized experiment, we first show that presenting LLM-powered analysis of each task feature, either sequentially or concurrently, does not significantly improve people's AI-assisted decision performance. To enable decision makers to better leverage LLM-powered analysis, we then propose an algorithmic framework to characterize the effects of LLM-powered analysis on human decisions and dynamically decide which analysis to present. Our evaluation with human subjects shows that this approach effectively improves decision makers' appropriate reliance on AI in AI-assisted decision making.

Paper Structure

This paper contains 30 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The example interfaces used in the Seq and All treatments of our experiment for the recidivism prediction task.
  • Figure 2: Comparing the average decision accuracy, overreliance, and underreliance on the AI model for participants across the Control, Seq, and All treatments, for both the income prediction and the recidivism prediction tasks. Error bars represent the 95% confidence intervals of the mean values.
  • Figure 3: Our human behavior model comprises three components: A) Initial State Mapping: this component encodes the decision making task and AI recommendation into the human's initial hidden state, which serves as the foundational setup to integrate the task details and initial AI insights into human decision making process. B) Hidden State Updating: This component characterizes how the human's hidden state evolves based on the presented LLM-powered analysis and the human's reactions (i.e., whether they agree or disagree with the LLM's analysis). Each update is dependent on the previous hidden state, reflecting the iterative incorporation of new information and human reasoning process into the decision making process. C) Final Decision: This component maps the human's latest hidden state to the actual decision made on the task. It translates the cumulative understanding and reasoning process through the hidden states into the human actual decision outcome.
  • Figure 4: Comparing the participants' average decision accuracy, overreliance, and underreliance on AI in different treatments for income prediction and recidivism prediction tasks. The pink dashed lines show that for participants in the Human-Solo treatment, (a) the accuracy of their decisions, (b) the frequencies at which their decisions align with AI recommendations (despite not seeing them) when AI recommendations are wrong, and (c) the frequencies at which their decisions differ from AI recommendations (despite not seeing them) when AI recommendations are correct. Error bars (shade) represent the 95% confidence intervals of the mean values. $\textsuperscript{*}$, $\textsuperscript{**}$, and $\textsuperscript{***}$ denote significance levels of $0.05$, $0.01$, and $0.001$, respectively.
  • Figure 5: Comparing the participants' average decision accuracy, overreliance, and underreliance on AI in different treatments for income prediction and recidivism prediction tasks, when fixing the number of interaction rounds at the same level. Error bars represent the 95% confidence intervals of the mean values. $\textsuperscript{*}$, $\textsuperscript{**}$, and $\textsuperscript{***}$ denote significance levels of $0.05$, $0.01$, and $0.001$, respectively.