Table of Contents
Fetching ...

Metacognition and Confidence Dynamics in Advice Taking from Generative AI

Clara Colombatto, Sean Rintel, Lev Tankelevitch

TL;DR

This paper investigates how prospective confidence in self and GenAI, along with retrospective confidence after task performance, orchestrate advice-taking from GenAI during a novel event-planning task. Through two preregistered studies—one allowing choice to seek advice and another randomizing exposure—the authors show that confidence in GenAI promotes advice-seeking and reliance, while confidence in one’s own abilities can suppress them; conversely, exposure to GenAI advice causally enhances retrospective confidence in both self and GenAI. The results also reveal metacognitive calibration effects: advice can improve task accuracy but often disrupt thorough verification, highlighting risks of over-reliance and misattribution of credit to AI. Collectively, the work delineates bidirectional metacognitive dynamics in human-genAI interaction and points to interventions to improve calibrated reliance in AI-assisted decision making.

Abstract

Generative Artificial Intelligence (GenAI) can aid humans in a wide range of tasks, but its effectiveness critically depends on users being able to evaluate the accuracy of GenAI outputs and their own expertise. Here we asked how confidence in self and GenAI contributes to decisions to seek and rely on advice from GenAI ('prospective confidence'), and how advice-taking in turn shapes this confidence ('retrospective confidence'). In a novel paradigm involving text generation, participants formulated plans for events, and could request advice from a GenAI (Study 1; N=200) or were randomly assigned to receive advice (Study 2; N=300), which they could rely on or ignore. Advice requests in Study 1 were related to higher prospective confidence in GenAI and lower confidence in self. Advice-seekers showed increased retrospective confidence in GenAI, while those who declined advice showed increased confidence in self. Random assignment in Study 2 revealed that advice exposure increases confidence in GenAI and in self, suggesting that GenAI advice-taking causally boosts retrospective confidence. These results were mirrored in advice reliance, operationalised as the textual similarity between GenAI advice and participants' responses, with reliance associated with increased retrospective confidence in both GenAI and self. Critically, participants who chose to obtain/rely on advice provided more detailed responses (likely due to the output's verbosity), but failed to check the output thoroughly, missing key information. These findings underscore a key role for confidence in interactions with GenAI, shaped by both prior beliefs about oneself and the reliability of AI, and context-dependent exposure to advice.

Metacognition and Confidence Dynamics in Advice Taking from Generative AI

TL;DR

This paper investigates how prospective confidence in self and GenAI, along with retrospective confidence after task performance, orchestrate advice-taking from GenAI during a novel event-planning task. Through two preregistered studies—one allowing choice to seek advice and another randomizing exposure—the authors show that confidence in GenAI promotes advice-seeking and reliance, while confidence in one’s own abilities can suppress them; conversely, exposure to GenAI advice causally enhances retrospective confidence in both self and GenAI. The results also reveal metacognitive calibration effects: advice can improve task accuracy but often disrupt thorough verification, highlighting risks of over-reliance and misattribution of credit to AI. Collectively, the work delineates bidirectional metacognitive dynamics in human-genAI interaction and points to interventions to improve calibrated reliance in AI-assisted decision making.

Abstract

Generative Artificial Intelligence (GenAI) can aid humans in a wide range of tasks, but its effectiveness critically depends on users being able to evaluate the accuracy of GenAI outputs and their own expertise. Here we asked how confidence in self and GenAI contributes to decisions to seek and rely on advice from GenAI ('prospective confidence'), and how advice-taking in turn shapes this confidence ('retrospective confidence'). In a novel paradigm involving text generation, participants formulated plans for events, and could request advice from a GenAI (Study 1; N=200) or were randomly assigned to receive advice (Study 2; N=300), which they could rely on or ignore. Advice requests in Study 1 were related to higher prospective confidence in GenAI and lower confidence in self. Advice-seekers showed increased retrospective confidence in GenAI, while those who declined advice showed increased confidence in self. Random assignment in Study 2 revealed that advice exposure increases confidence in GenAI and in self, suggesting that GenAI advice-taking causally boosts retrospective confidence. These results were mirrored in advice reliance, operationalised as the textual similarity between GenAI advice and participants' responses, with reliance associated with increased retrospective confidence in both GenAI and self. Critically, participants who chose to obtain/rely on advice provided more detailed responses (likely due to the output's verbosity), but failed to check the output thoroughly, missing key information. These findings underscore a key role for confidence in interactions with GenAI, shaped by both prior beliefs about oneself and the reliability of AI, and context-dependent exposure to advice.

Paper Structure

This paper contains 50 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: A Novel Task to Measure Reliance on GenAI. Participants were asked to formulate a plan for an upcoming event, and had the opportunity to request advice from a GenAI system. Each participant completed a total of four events (work retreat, office recruitment, camping trip, dinner party) in a randomized order. Note that text has been simplified for visualization purposes.
  • Figure 2: Relationship Between Confidence and Advice-Taking. (A) Number of subjects who requested advice on no trials (0), some (1-3), or all trials (4). (B) Participants who were more confident in themselves were less likely to request advice; those who were more confident in GenAI were instead more likely to request advice. Points correspond to averages across all participants for each confidence value (from 1 to 7), and error bars correspond to standard errors; for this and subsequent plots, lines correspond to best-fit linear regression line, and shaded bands represent 95% confidence intervals. (C) Density of GenAI advice-participants' plan similarity, separately for participants who requested (blue) or declined (red) advice. (D) Participants who were more confident in themselves were less likely to rely on the advice (measured as the cosine similarity between the advice provided by the GenAI and the plan submitted by participants); those who were more confident in the GenAI were instead more likely to rely on the advice. Jittered points correspond to single trials where participants requested advice (with confidence in GenAI averaged across two questions; note that, for this and subsequent plots, points corresponding to individual trials were jittered to improve the visibility of overlapping observations within each confidence level, while still preserving clear separation between confidence levels).
  • Figure 3: Changes from Prospective to Retrospective Confidence in Study 1. (A) Confidence in self was overall highly consistent from before to after the task, but participants who declined advice showed a boost in confidence after the task. (B) Confidence in GenAI was overall highly consistent from before to after the task, but participants who requested advice showed a boost in confidence after the task. Jittered points correspond to single trials (with confidence in GenAI averaged across two questions).
  • Figure 4: Confidence Calibration in Study 1. (A) Participants who declined advice were more likely to include in their plan a piece of information that was removed from the advice. Confidence in both self and GenAI was overall unrelated to this measure of response verification (perhaps due to floor effects as per the overall low response verification rates). Points correspond to averages across all participants for each confidence value (from 1 to 7). (B) Participants who requested advice provided more complete responses (perhaps due to high reliance on the advice, which was highly detailed). Participants who were more confident in themselves provided more complete responses, but only when they declined advice; those who requested the advice instead showed a negative relationship between accuracy and confidence, suggesting that advice from GenAI disrupts metacognitive calibration. Jittered points correspond to single trials (with confidence in GenAI averaged across two questions).
  • Figure 5: Changes from Prospective to Retrospective Confidence in Study 2. (A) Confidence in self was overall consistent from before to after the task, but there was a greater increase from prospective to retrospective confidence in participants who were exposed to GenAI advice. (B) Confidence in GenAI was overall consistent from before to after the task, but there was a greater increase from prospective to retrospective confidence in participants who were exposed to GenAI advice. Jittered points correspond to single trials.
  • ...and 5 more figures