(De)Noise: Moderating the Inconsistency Between Human Decision-Makers

Nina Grgić-Hlača; Junaid Ali; Krishna P. Gummadi; Jennifer Wortman Vaughan

(De)Noise: Moderating the Inconsistency Between Human Decision-Makers

Nina Grgić-Hlača, Junaid Ali, Krishna P. Gummadi, Jennifer Wortman Vaughan

TL;DR

This study tackles the problem of substantial inconsistency in human decision-making by testing algorithmic decision aids in real estate pricing. It compares five reviewing strategies, including one-by-one and meaningfully selected pairwise reviews, with and without machine advice, to measure updates, accuracy, and inter-respondent consistency. The authors introduce two decision-aid families: explicit ground-truth advice (T2) and pairwise-consistency guidance (T4/T5), and show that both approaches improve updating propensity, accuracy, and cross-respondent agreement, especially when combined with pairwise reviews. A key contribution is demonstrating that ground-truth-free, consistency-focused guidance (T4/T5) can match the performance of traditional advice, broadening applicability to settings where objective ground truth is costly or ill-defined, while also highlighting normative considerations in deploying such aids.

Abstract

Prior research in psychology has found that people's decisions are often inconsistent. An individual's decisions vary across time, and decisions vary even more across people. Inconsistencies have been identified not only in subjective matters, like matters of taste, but also in settings one might expect to be more objective, such as sentencing, job performance evaluations, or real estate appraisals. In our study, we explore whether algorithmic decision aids can be used to moderate the degree of inconsistency in human decision-making in the context of real estate appraisal. In a large-scale human-subject experiment, we study how different forms of algorithmic assistance influence the way that people review and update their estimates of real estate prices. We find that both (i) asking respondents to review their estimates in a series of algorithmically chosen pairwise comparisons and (ii) providing respondents with traditional machine advice are effective strategies for influencing human responses. Compared to simply reviewing initial estimates one by one, the aforementioned strategies lead to (i) a higher propensity to update initial estimates, (ii) a higher accuracy of post-review estimates, and (iii) a higher degree of consistency between the post-review estimates of different respondents. While these effects are more pronounced with traditional machine advice, the approach of reviewing algorithmically chosen pairs can be implemented in a wider range of settings, since it does not require access to ground truth data.

(De)Noise: Moderating the Inconsistency Between Human Decision-Makers

TL;DR

Abstract

Paper Structure (31 sections, 2 equations, 14 figures, 4 tables)

This paper contains 31 sections, 2 equations, 14 figures, 4 tables.

Introduction
Background
Machine-Assisted Decision-Making
The (In)Consistency of Human Decisions
Anticipated Reactions to Feedback about Inconsistency
Methodology
Experimental Design
Stimulus Material
Data Collection
Decision Aids
Developing the Decision Aid Utilized in T2
Developing the Decision Aid Utilized in T4 and T5
Results
H1: Overall Change in Decisions
H1': Propensity to Change Particular Decisions
...and 16 more sections

Figures (14)

Figure 1: Graphical overview of experimental conditions T1-T5. In T1 and T2, respondents review their decisions one-by-one, while in T3-T5 they review decisions in randomly (T3) or meaningfully (T4 and T5) selected pairs. In T2 and T5 respondents are additionally provided with (different kinds of) explicit machine advice.
Figure 2: Description of the experimental design shown to participants at the beginning of the experiment.
Figure 3: Stimulus material.
Figure 4: Average duration of the experiment, per experimental condition, and per experimental phase. The experimental conditions T1--T5 are shown on the x-axis. The values for the pre-review experimental phase are shown in blue, while the post-review values are shown in orange. We report mean values calculated across respondents $\pm$ 1.96 standard errors of the mean (SEM).
Figure 5: H1: Effect of the interventions on people's propensity to update decisions, across all 30 apartments. The experimental conditions T1--T5 are shown on the x-axis. We report mean values calculated across respondents $\pm$ 1.96 standard errors of the mean (SEM).
...and 9 more figures

(De)Noise: Moderating the Inconsistency Between Human Decision-Makers

TL;DR

Abstract

(De)Noise: Moderating the Inconsistency Between Human Decision-Makers

Authors

TL;DR

Abstract

Table of Contents

Figures (14)