Table of Contents
Fetching ...

AI Oversight and Human Mistakes: Evidence from Centre Court

David Almog, Romain Gauriot, Lionel Page, Daniel Martin

TL;DR

This paper studies how AI oversight via Hawk-Eye influences human umpire decisions in professional tennis, using three data sources to compare performance before and after AI review. It combines a ground-truth Hawk-Eye Base dataset, a Challenge dataset, and Video Auditing to construct a merged panel of 698 matches and applies a two-stage rational-inattention model with state-dependent AI-overrule penalties, yielding $c^{I}$ and $c^{O}$ estimates. The main findings show that AI oversight reduces the overall mistake rate for close calls but increases the likelihood of calling balls in near the line, shifting errors from Type II to Type I; structurally, umpires exhibit 37% greater concern for Type II errors after Hawk-Eye introduction due to the AI penalty. Heterogeneity by umpire skill and play type reveals that less-skilled umpires react more to AI oversight and that effects differ between serves and non-serves, with nuanced patterns in the closest calls. The work provides a framework for evaluating welfare effects of AI oversight, highlighting potential gains from improved accuracy alongside behavioral costs and incentive misalignment concerns in high-stakes decision environments.

Abstract

Powered by the increasing predictive capabilities of machine learning algorithms, artificial intelligence (AI) systems have the potential to overrule human mistakes in many settings. We provide the first field evidence that the use of AI oversight can impact human decision-making. We investigate one of the highest visibility settings where AI oversight has occurred: Hawk-Eye review of umpires in top tennis tournaments. We find that umpires lowered their overall mistake rate after the introduction of Hawk-Eye review, but also that umpires increased the rate at which they called balls in, producing a shift from making Type II errors (calling a ball out when in) to Type I errors (calling a ball in when out). We structurally estimate the psychological costs of being overruled by AI using a model of attention-constrained umpires, and our results suggest that because of these costs, umpires cared 37% more about Type II errors under AI oversight.

AI Oversight and Human Mistakes: Evidence from Centre Court

TL;DR

This paper studies how AI oversight via Hawk-Eye influences human umpire decisions in professional tennis, using three data sources to compare performance before and after AI review. It combines a ground-truth Hawk-Eye Base dataset, a Challenge dataset, and Video Auditing to construct a merged panel of 698 matches and applies a two-stage rational-inattention model with state-dependent AI-overrule penalties, yielding and estimates. The main findings show that AI oversight reduces the overall mistake rate for close calls but increases the likelihood of calling balls in near the line, shifting errors from Type II to Type I; structurally, umpires exhibit 37% greater concern for Type II errors after Hawk-Eye introduction due to the AI penalty. Heterogeneity by umpire skill and play type reveals that less-skilled umpires react more to AI oversight and that effects differ between serves and non-serves, with nuanced patterns in the closest calls. The work provides a framework for evaluating welfare effects of AI oversight, highlighting potential gains from improved accuracy alongside behavioral costs and incentive misalignment concerns in high-stakes decision environments.

Abstract

Powered by the increasing predictive capabilities of machine learning algorithms, artificial intelligence (AI) systems have the potential to overrule human mistakes in many settings. We provide the first field evidence that the use of AI oversight can impact human decision-making. We investigate one of the highest visibility settings where AI oversight has occurred: Hawk-Eye review of umpires in top tennis tournaments. We find that umpires lowered their overall mistake rate after the introduction of Hawk-Eye review, but also that umpires increased the rate at which they called balls in, producing a shift from making Type II errors (calling a ball out when in) to Type I errors (calling a ball in when out). We structurally estimate the psychological costs of being overruled by AI using a model of attention-constrained umpires, and our results suggest that because of these costs, umpires cared 37% more about Type II errors under AI oversight.
Paper Structure (22 sections, 10 equations, 12 figures, 8 tables)

This paper contains 22 sections, 10 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Incorrect call rates by proximity to the line. Each dot is the rate of incorrect calls for a bin of 20 mm. Dots to the left of the dashed line represent bins out of bounds, and the right of the dashed line represents bins in bounds.
  • Figure 2: Impact of AI oversight on the incorrect call rate by proximity to the line. Each dot represents the coefficient on the interaction between the distance bin and PostHK, the indicator variable that equals 1 if the Hawk-Eye review system is active. Controls for point and match characteristics are included.
  • Figure 3: Impact of AI oversight on the incorrect call rate by proximity to the line (by time sub-period). Matches are grouped into those with Hawk-Eye review in all of 2006 and the first half of 2007 and those with Hawk-Eye review in the second half of 2007 and all of 2008. Each dot represents the coefficient on the interaction between the distance bin and PostHK, the indicator variable that equals 1 if the Hawk-Eye review system is active.
  • Figure 4: The rate of calling a ball in after the introduction to Hawk-Eye review for balls landing <20 mm from the line, regardless of which side of the line they actually bounced on. Each dot represents the rate of calling a ball in for a tournament. The red line is the best linear fit using the dots as observations and weighting them based on the number of calls each tournament contributed. The blue solid line represents the rate of calling a ball in for the seven tournaments that did not have AI review (the blue dash lines indicate the 95 confidence interval).
  • Figure 5: Impact of AI oversight on the incorrect call rate by proximity to the line (by tournament stage). Matches are grouped into those at the final, semifinal, and quarterfinal stages and those at all other (earlier) stages of a tournament and then analyzed separately. Each dot represents the coefficient on the interaction between the distance bin and PostHK, the indicator variable that equals 1 if the Hawk-Eye review system is active.
  • ...and 7 more figures