Table of Contents
Fetching ...

The Value of AI Advice: Personalized and Value-Maximizing AI Advisors Are Necessary to Reliably Benefit Experts and Organizations

Nicholas Wolczynski, Maytal Saar-Tsechansky, Tong Wang

TL;DR

The paper confronts the mismatch between AI advisor performance and real-world value in high-stakes decisions, arguing that reliability and value-maximization require accounting for advising costs and human behavior. It introduces the ReV-AI framework and a concrete, interpretable implementation called TeamRules (TR), designed to learn context-specific, selective advice with inherent persuasiveness. Through synthetic and empirical analyses across multiple domains and decision-maker behaviors, TR consistently adds more value than traditional task-focused or non-personalized advisors, especially under costly or miscalibrated human advice-taking. The work demonstrates that superhuman AI accuracy is neither necessary nor sufficient for value, and that value-driven, customizable advisors can substantially improve outcomes while mitigating potential harms. These findings carry important managerial implications for deploying AI advisers in real organizations and lay groundwork for future extensions to broader HCI configurations and decision domains.

Abstract

Despite advances in AI's performance and interpretability, AI advisors can undermine experts' decisions and increase the time and effort experts must invest to make decisions. Consequently, AI systems deployed in high-stakes settings often fail to consistently add value across experts and organizations and can even diminish the value that experts alone provide. Beyond harm in specific domains, such outcomes impede progress in research and practice, underscoring the need to understand when and why different AI advisors add or diminish value. To bridge this gap, we stress the importance of assessing the value AI advice brings to real-world contexts when designing and evaluating AI advisors. Building on this perspective, we characterize key pillars -- pathways through which AI advice impacts value -- and develop a framework that incorporates these pillars to create reliable, personalized, and value-adding advisors. Our results highlight the need for value-driven development of AI advisors that advise selectively, are tailored to experts' unique behaviors, and are optimized for context-specific trade-offs between decision improvements and advising costs. They also reveal how the lack of inclusion of these pillars in the design of AI advising systems may be contributing to the failures observed in practical applications.

The Value of AI Advice: Personalized and Value-Maximizing AI Advisors Are Necessary to Reliably Benefit Experts and Organizations

TL;DR

The paper confronts the mismatch between AI advisor performance and real-world value in high-stakes decisions, arguing that reliability and value-maximization require accounting for advising costs and human behavior. It introduces the ReV-AI framework and a concrete, interpretable implementation called TeamRules (TR), designed to learn context-specific, selective advice with inherent persuasiveness. Through synthetic and empirical analyses across multiple domains and decision-maker behaviors, TR consistently adds more value than traditional task-focused or non-personalized advisors, especially under costly or miscalibrated human advice-taking. The work demonstrates that superhuman AI accuracy is neither necessary nor sufficient for value, and that value-driven, customizable advisors can substantially improve outcomes while mitigating potential harms. These findings carry important managerial implications for deploying AI advisers in real organizations and lay groundwork for future extensions to broader HCI configurations and decision domains.

Abstract

Despite advances in AI's performance and interpretability, AI advisors can undermine experts' decisions and increase the time and effort experts must invest to make decisions. Consequently, AI systems deployed in high-stakes settings often fail to consistently add value across experts and organizations and can even diminish the value that experts alone provide. Beyond harm in specific domains, such outcomes impede progress in research and practice, underscoring the need to understand when and why different AI advisors add or diminish value. To bridge this gap, we stress the importance of assessing the value AI advice brings to real-world contexts when designing and evaluating AI advisors. Building on this perspective, we characterize key pillars -- pathways through which AI advice impacts value -- and develop a framework that incorporates these pillars to create reliable, personalized, and value-adding advisors. Our results highlight the need for value-driven development of AI advisors that advise selectively, are tailored to experts' unique behaviors, and are optimized for context-specific trade-offs between decision improvements and advising costs. They also reveal how the lack of inclusion of these pillars in the design of AI advising systems may be contributing to the failures observed in practical applications.
Paper Structure (25 sections, 6 equations, 6 figures, 3 tables)

This paper contains 25 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: AI-Advised Decision Making
  • Figure 2: Framework for producing reliable, value-maximizing AI advisors
  • Figure 3: TeamRules Advising Process in Deployment
  • Figure 4: Value added by different AI advisors relative to independent human for environments with varying trade-offs between advising costs and decision benefits for Heart Disease, FICO, and HR datasets. X-axis reflects contexts in which the cost from 1/x human engagements are equivalent to the loss from one incorrect decision: higher x-values reflect that fewer human engagements are permissible to achieve a given decision benefit. task-only model generalization accuracy is 0.78 (Heart Disease), 0.71 (FICO), 0.79 (HR). The human's standalone generalization accuracy for each dataset and decision behavior is 0.71 (Heart Disease Difficulty-Biased), 0.83 (Heart Disease Group-Biased), 0.73 (FICO Difficulty-Biased), 0.70 (FICO Group-Biased), 0.70 (HR Difficulty-Biased), 0.71 (HR Group-Biased). Thus, plots a,c,e,f,g, and h show settings in which a task-only model can achieve superhuman accuracy, while plots b and d show settings in which a task-only model cannot reach human performance. Results show average value-added +- SE (shaded region) over 10 repetitions.
  • Figure 5: Case Study 1 - Advising Outcomes, defined in Table \ref{['tab:measures']}. TR, task-only, TR-No(ADB) advisors advise an over-confident expert. The context's trade-off is $0.1$, reflecting that reconciling up to 10 contradictory pieces of AI advice is deemed cost-effective if the contradictions yield at least one improved decision. Advising costs incurred shown are converted to units of decision loss given the trade-off (AU). The expert has an independent decision accuracy of 90% on the Male population (majority), a 60% accuracy on the Female population (minority), and an overall accuracy of 87.5% on the entire population. This expert is highly confident (97.5% confidence) on the entire population. Results are averages over 20 repetitions $\pm$ SE (vertical line at center of each bar).
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1: Rule
  • Definition 2: Rule Set