Table of Contents
Fetching ...

Underspecified Human Decision Experiments Considered Harmful

Jessica Hullman, Alex Kale, Jason Hartline

TL;DR

The paper defines a normative decision framework by merging statistical decision theory and information economics to identify what constitutes a well-defined decision problem in studies of human decisions from displays. It argues that many AI-assisted decision studies are underspecified, making it difficult to attribute observed performance losses to bias. A meta-analysis of 46 studies shows that only about a quarter of applicable studies provided sufficient information to determine the normative decision in at least one condition, and many lacked consistent scoring rules. Through examples of AI-assisted flight booking and election forecasts, the authors illustrate how to redesign experiments so that posterior beliefs and scoring rules align with a clearly defined decision problem. The work offers concrete guidelines for experiment design and ethics, aiming to improve the validity and generalizability of conclusions about human decision-making in HCI, HCAI, and visualization contexts.

Abstract

Decision-making with information displays is a key focus of research in areas like human-AI collaboration and data visualization. However, what constitutes a decision problem, and what is required for an experiment to conclude that decisions are flawed, remain imprecise. We present a widely applicable definition of a decision problem synthesized from statistical decision theory and information economics. We claim that to attribute loss in human performance to bias, an experiment must provide the information that a rational agent would need to identify the normative decision. We evaluate whether recent empirical research on AI-assisted decisions achieves this standard. We find that only 10 (26%) of 39 studies that claim to identify biased behavior presented participants with sufficient information to make this claim in at least one treatment condition. We motivate the value of studying well-defined decision problems by describing a characterization of performance losses they allow to be conceived.

Underspecified Human Decision Experiments Considered Harmful

TL;DR

The paper defines a normative decision framework by merging statistical decision theory and information economics to identify what constitutes a well-defined decision problem in studies of human decisions from displays. It argues that many AI-assisted decision studies are underspecified, making it difficult to attribute observed performance losses to bias. A meta-analysis of 46 studies shows that only about a quarter of applicable studies provided sufficient information to determine the normative decision in at least one condition, and many lacked consistent scoring rules. Through examples of AI-assisted flight booking and election forecasts, the authors illustrate how to redesign experiments so that posterior beliefs and scoring rules align with a clearly defined decision problem. The work offers concrete guidelines for experiment design and ethics, aiming to improve the validity and generalizability of conclusions about human decision-making in HCI, HCAI, and visualization contexts.

Abstract

Decision-making with information displays is a key focus of research in areas like human-AI collaboration and data visualization. However, what constitutes a decision problem, and what is required for an experiment to conclude that decisions are flawed, remain imprecise. We present a widely applicable definition of a decision problem synthesized from statistical decision theory and information economics. We claim that to attribute loss in human performance to bias, an experiment must provide the information that a rational agent would need to identify the normative decision. We evaluate whether recent empirical research on AI-assisted decisions achieves this standard. We find that only 10 (26%) of 39 studies that claim to identify biased behavior presented participants with sufficient information to make this claim in at least one treatment condition. We motivate the value of studying well-defined decision problems by describing a characterization of performance losses they allow to be conceived.
Paper Structure (23 sections, 7 equations, 2 figures, 1 table)

This paper contains 23 sections, 7 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Diagram depicting normative decision for example AI-assisted flight booking scenario. From left to right: The agent is informed of the decision problem, including the action, scoring rule, and prior information about the data-generating model. They next view a signal generated by the data-generating model, which is correlated with the state. The agent updates their beliefs about the state, then chooses the score-maximizing action (in this case, to not book the flight).
  • Figure 2: Results from coding 46 studies surveyed by lai2023towards. Seven studies did not evaluate human decisions, instead focusing on capturing perceptions or subjective appraisals. Of the remaining 39, 25 did not communicate sufficient information to participants for them to identify the best response. Additionally, 25 did not use the same scoring rule for incentivizing participants as for analyzing their responses.