Table of Contents
Fetching ...

An Empirical Examination of the Evaluative AI Framework

Jaroslaw Kornowicz

TL;DR

Findings from the current behavioral experiment reveal no significant improvement in decision-making performance and limited user engagement with the evidence provided, resulting in cognitive processes similar to those observed in traditional AI systems.

Abstract

This study empirically examines the "Evaluative AI" framework, which aims to enhance the decision-making process for AI users by transitioning from a recommendation-based approach to a hypothesis-driven one. Rather than offering direct recommendations, this framework presents users pro and con evidence for hypotheses to support more informed decisions. However, findings from the current behavioral experiment reveal no significant improvement in decision-making performance and limited user engagement with the evidence provided, resulting in cognitive processes similar to those observed in traditional AI systems. Despite these results, the framework still holds promise for further exploration in future research.

An Empirical Examination of the Evaluative AI Framework

TL;DR

Findings from the current behavioral experiment reveal no significant improvement in decision-making performance and limited user engagement with the evidence provided, resulting in cognitive processes similar to those observed in traditional AI systems.

Abstract

This study empirically examines the "Evaluative AI" framework, which aims to enhance the decision-making process for AI users by transitioning from a recommendation-based approach to a hypothesis-driven one. Rather than offering direct recommendations, this framework presents users pro and con evidence for hypotheses to support more informed decisions. However, findings from the current behavioral experiment reveal no significant improvement in decision-making performance and limited user engagement with the evidence provided, resulting in cognitive processes similar to those observed in traditional AI systems. Despite these results, the framework still holds promise for further exploration in future research.

Paper Structure

This paper contains 22 sections, 1 equation, 16 figures.

Figures (16)

  • Figure 1: Mean Absolute SHAP Values for Features in Model Interpretation. The horizontal bar chart ranks features based on their mean absolute SHAP values, which indicate the average impact of each feature on the model’s predictions.
  • Figure 2: Comparison of Brier Scores Across Different Treatments. The bar chart presents the Brier score performance (lower values indicate better predictive accuracy) for various treatment groups: Control, Recommendation Only, Evidence Only, Recommendation and Evidence, and Evaluative AI. Error bars denote the 95% confidence intervals. Horizontal dashed lines indicate benchmarks for AI Model (red) and Random Guess (green) performance. The results suggest similar performance levels across treatments, with no significant deviations observed.
  • Figure 3: Average Task Completion Time Across Different Treatments. This bar chart shows the average time taken (in seconds) to complete tasks for each treatment group: Control, Recommendation Only, Evidence Only, Recommendation and Evidence, and Evaluative AI. Error bars represent the 95% confidence intervals. Significant differences between groups are indicated by p-values above the bars, illustrating where statistically significant differences ($p<0.01$, $p<0.001$) were observed between treatments.
  • Figure 4: Cognitive Load (NASA-TLX) Across Different Treatments. The bar chart illustrates the cognitive load scores, as measured by the NASA Task Load Index (TLX), for each treatment group: Control, Recommendation Only, Evidence Only, Recommendation and Evidence, and Evaluative AI. Error bars show the 95% confidence intervals, providing an indication of the variability within each group. The results suggest similar cognitive load levels across treatments, with no significant deviations observed.
  • Figure 5: Frequency of AI Mentions by Treatment Group. This bar chart displays the percentage of participants mentioning AI across four treatment groups: Recommendation Only, Evidence Only, Recommendation and Evidence and Evaluative AI.
  • ...and 11 more figures