Table of Contents
Fetching ...

Overcoming the Incentive Collapse Paradox

Qichuan Yin, Ziwei Su, Shuangning Li

Abstract

AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this problem in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. We propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

Overcoming the Incentive Collapse Paradox

Abstract

AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this problem in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. We propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

Paper Structure

This paper contains 27 sections, 6 theorems, 87 equations, 4 figures, 1 algorithm.

Key Result

Theorem 3.1

Under Assumption assu:regularity, suppose the payment $W$ depends linearly on $Z_1,\ldots,Z_n$ and the utility function $\mu(\cdot)$ is risk-neutral, i.e., $\mu(w)=w$. To sustain any effort level $e > e_{\min} > 0$, the expected payment must satisfy where $C>0$ is a constant determined by the effort $e_{\min}$.

Figures (4)

  • Figure 1: Comparison of inference performance for Biden approval (top row) and Trump approval (bottom row). The left panel displays representative confidence intervals from all methods; the middle panel shows the average interval width as a function of the budget; and the right panel reports empirical coverage, i.e., the proportion of intervals covering the true mean.
  • Figure 2: Comparison of inference performance for the odds ratio. The left panel displays representative confidence intervals from all methods; the middle panel shows the average interval width as a function of the budget; and the right panel reports empirical coverage, i.e., the proportion of intervals covering the true odds ratio.
  • Figure 3: Percentage of budget saved, for the Alphafold experiment.
  • Figure 4: Percentage of budget saved by our method (relative to different baselines) when aiming to achieve the same performance at different budget levels.

Theorems & Definitions (17)

  • Theorem 3.1: Failure of linear accuracy-based payments
  • Theorem 3.2: Failure of accuracy-based payments
  • Theorem 4.1: Effort guarantees under incentive-robust payments
  • Theorem 5.1
  • Lemma 5.2
  • Example 5.3
  • Example 5.4
  • Example 5.5
  • Theorem 5.6
  • proof : Proof of Theorem \ref{['thm:impossible']}
  • ...and 7 more