When Should Users Check? A Decision-Theoretic Model of Confirmation Frequency in Multi-Step AI Agent Tasks
Jieyu Zhou, Aryan Roy, Sneh Gupta, Daniel Weitekamp, Christopher J. MacLellan
TL;DR
This work addresses when users should intervene during long-horizon, agentic AI tasks. It introduces a decision-theoretic, minimum-time scheduling model that places intermediate confirmations (CDCR-based checks) to balance confirmation burden against error-diagnosis cost, and validates it with a formative study and a larger within-subjects experiment. Findings show a strong user preference (81%) for intermediate confirmation and a 13.54% reduction in task completion time, with early-stage errors yielding the greatest gains. The contributions position confirmation as a mixed-initiative interaction design, offering guidance for more reliable, user-supervised agent systems and pointing to broader cost and trust considerations in practical deployments.
Abstract
Existing AI agents typically execute multi-step tasks autonomously and only allow user confirmation at the end. During execution, users have little control, making the confirm-at-end approach brittle: a single error can cascade and force a complete restart. Confirming every step avoids such failures, but imposes tedious overhead. Balancing excessive interruptions against costly rollbacks remains an open challenge. We address this problem by modeling confirmation as a minimum time scheduling problem. We conducted a formative study with eight participants, which revealed a recurring Confirmation-Diagnosis-Correction-Redo (CDCR) pattern in how users monitor errors. Based on this pattern, we developed a decision-theoretic model to determine time-efficient confirmation point placement. We then evaluated our approach using a within-subjects study where 48 participants monitored AI agents and repaired their mistakes while executing tasks. Results show that 81 percent of participants preferred our intermediate confirmation approach over the confirm-at-end approach used by existing systems, and task completion time was reduced by 13.54 percent.
