AI Oversight and Human Mistakes: Evidence from Centre Court
David Almog, Romain Gauriot, Lionel Page, Daniel Martin
TL;DR
This paper studies how AI oversight via Hawk-Eye influences human umpire decisions in professional tennis, using three data sources to compare performance before and after AI review. It combines a ground-truth Hawk-Eye Base dataset, a Challenge dataset, and Video Auditing to construct a merged panel of 698 matches and applies a two-stage rational-inattention model with state-dependent AI-overrule penalties, yielding $c^{I}$ and $c^{O}$ estimates. The main findings show that AI oversight reduces the overall mistake rate for close calls but increases the likelihood of calling balls in near the line, shifting errors from Type II to Type I; structurally, umpires exhibit 37% greater concern for Type II errors after Hawk-Eye introduction due to the AI penalty. Heterogeneity by umpire skill and play type reveals that less-skilled umpires react more to AI oversight and that effects differ between serves and non-serves, with nuanced patterns in the closest calls. The work provides a framework for evaluating welfare effects of AI oversight, highlighting potential gains from improved accuracy alongside behavioral costs and incentive misalignment concerns in high-stakes decision environments.
Abstract
Powered by the increasing predictive capabilities of machine learning algorithms, artificial intelligence (AI) systems have the potential to overrule human mistakes in many settings. We provide the first field evidence that the use of AI oversight can impact human decision-making. We investigate one of the highest visibility settings where AI oversight has occurred: Hawk-Eye review of umpires in top tennis tournaments. We find that umpires lowered their overall mistake rate after the introduction of Hawk-Eye review, but also that umpires increased the rate at which they called balls in, producing a shift from making Type II errors (calling a ball out when in) to Type I errors (calling a ball in when out). We structurally estimate the psychological costs of being overruled by AI using a model of attention-constrained umpires, and our results suggest that because of these costs, umpires cared 37% more about Type II errors under AI oversight.
