Table of Contents
Fetching ...

Understanding the Effects of AI-Assisted Critical Thinking on Human-AI Decision Making

Harry Yizhou Tian, Hasan Amin, Ming Yin

TL;DR

The paper addresses suboptimal human–AI decision making caused by insufficient scrutiny of human reasoning. It introduces the AI-Assisted Critical Thinking (AACT) framework, which uses a domain-specific AI to perform counterfactual analyses of a decision-maker's arguments and guides structured critique and correction grounded in the Recognition/Metacognition model. Through a house price prediction case study, it demonstrates that AACT can reduce over-reliance on AI and enhance decision autonomy, albeit at the cost of higher cognitive load, with benefits amplified for users with higher AI familiarity and task knowledge. The work contributes a formal framework, a concrete conversational AI instantiation, and empirical evidence suggesting practical use in high-stakes or autonomy-valued domains, while outlining design, ethical, and generalizability considerations for deploying reflective AI systems.

Abstract

Despite the growing prevalence of human-AI decision making, the human-AI team's decision performance often remains suboptimal, partially due to insufficient examination of humans' own reasoning. In this paper, we explore designing AI systems that directly analyze humans' decision rationales and encourage critical reflection of their own decisions. We introduce the AI-Assisted Critical Thinking (AACT) framework, which leverages a domain-specific AI model's counterfactual analysis of human decision to help decision-makers identify potential flaws in their decision argument and support the correction of them. Through a case study on house price prediction, we find that AACT outperforms traditional AI-based decision-support in reducing over-reliance on AI, though also triggering higher cognitive load. Subgroup analysis reveals AACT can be particularly beneficial for some decision-makers such as those very familiar with AI technologies. We conclude by discussing the practical implications of our findings, use cases and design choices of AACT, and considerations for using AI to facilitate critical thinking.

Understanding the Effects of AI-Assisted Critical Thinking on Human-AI Decision Making

TL;DR

The paper addresses suboptimal human–AI decision making caused by insufficient scrutiny of human reasoning. It introduces the AI-Assisted Critical Thinking (AACT) framework, which uses a domain-specific AI to perform counterfactual analyses of a decision-maker's arguments and guides structured critique and correction grounded in the Recognition/Metacognition model. Through a house price prediction case study, it demonstrates that AACT can reduce over-reliance on AI and enhance decision autonomy, albeit at the cost of higher cognitive load, with benefits amplified for users with higher AI familiarity and task knowledge. The work contributes a formal framework, a concrete conversational AI instantiation, and empirical evidence suggesting practical use in high-stakes or autonomy-valued domains, while outlining design, ethical, and generalizability considerations for deploying reflective AI systems.

Abstract

Despite the growing prevalence of human-AI decision making, the human-AI team's decision performance often remains suboptimal, partially due to insufficient examination of humans' own reasoning. In this paper, we explore designing AI systems that directly analyze humans' decision rationales and encourage critical reflection of their own decisions. We introduce the AI-Assisted Critical Thinking (AACT) framework, which leverages a domain-specific AI model's counterfactual analysis of human decision to help decision-makers identify potential flaws in their decision argument and support the correction of them. Through a case study on house price prediction, we find that AACT outperforms traditional AI-based decision-support in reducing over-reliance on AI, though also triggering higher cognitive load. Subgroup analysis reveals AACT can be particularly beneficial for some decision-makers such as those very familiar with AI technologies. We conclude by discussing the practical implications of our findings, use cases and design choices of AACT, and considerations for using AI to facilitate critical thinking.
Paper Structure (53 sections, 3 equations, 14 figures, 8 tables)

This paper contains 53 sections, 3 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Illustrative example of how AACT supports decision-makers in critiquing and correcting their argument. (A) The decision-maker makes a decision, selects some features as their argument, and reports their confidence. (B) The domain-specific AI model performs counterfactual perspective-taking by adopting the decision-maker's perspective to evaluate its confidence in the decision-maker's decision, then conducting counterfactual analysis by adding or removing features from their argument. (C) Using AI's counterfactual perspective-taking results, AACT prompts the decision-maker to critique their argument through targeted self-reflection, supports revisions with AI-based correction suggestions, and enables data-based triangulation.
  • Figure 2: Flowchart of our AACT implementation. (1) Middle panel: the main workflow of presenting the three types of issues in decision-maker's argument, along with highlighting agreement between AI and human on reliable features. (2) Left panel: the issues and their associated features are identified using AI's counterfactual perspective-taking results. (3) Right panel: For the incompleteness, unreliability, and conflict stages, AACT engages decision-makers in the critique-and-correction workflow; at the end of each stage, decision-makers can update their decision and argument.
  • Figure 3: Comparison on participants' decision performance measured by their (a) accuracy and (b) balanced accuracy. Error bars represent the 95% confidence intervals of the mean values. $\textsuperscript{*}$, $\textsuperscript{**}$, and $\textsuperscript{***}$ denote statistical significance levels of $0.05$, $0.01$, and $0.001$ respectively.
  • Figure 4: Comparisons on participants' reliance (agreement and switch fraction) and appropriateness of reliance (over-reliance ratio and under-reliance ratio) on AI. Error bars represent the 95% confidence intervals of the mean values. $\textsuperscript{*}$, $\textsuperscript{**}$, and $\textsuperscript{***}$ denote statistical significance levels of $0.05$, $0.01$, and $0.001$ respectively.
  • Figure 5: Comparisons on users' subjective perceptions across treatments. Error bars represent 95% confidence intervals of the mean values. $\textsuperscript{**}$and $\textsuperscript{***}$ denote statistical significance levels of $0.01$ and $0.001$ respectively.
  • ...and 9 more figures