Accuracy Standards for AI at Work vs. Personal Life: Evidence from an Online Survey
Gaston Besanson, Federico Todeschini
TL;DR
The paper addresses how accuracy expectations for AI differ across professional and personal use, defining accuracy as context-specific reliability for probabilistic outputs. Using an online survey of $N=300$ adults (with $N=170$ answering both work and personal items), it finds a substantial gap: $24.1rac{ ext{work}}{ ext{personal}}$ vs $8.8rac{ ext{work}}{ ext{personal}}$ top-box accuracy, and a $15.3$ percentage-point difference ($p<0.001$) in stringent thresholds, with broader differences visible on top-two-box and 1–5 scales. Determinants of these trade-offs show that higher importance placed on work accuracy and lower reliance on tools predict stricter work standards, while perceived personal impact when AI is unavailable nudges decisions toward work–less stringent thresholds; resilience analyses reveal greater disruption in personal routines when tools are off. The findings inform AI design and deployment, arguing for higher reliability, robust fallbacks, and human-in-the-loop verification in professional settings, and suggesting greater tolerance for variance in personal contexts while remaining mindful of fragility from over-reliance. Overall, the study advances measurement of context-dependent accuracy and highlights practical implications for balancing reliability and convenience in AI-enabled work and life.
Abstract
We study how people trade off accuracy when using AI-powered tools in professional versus personal contexts for adoption purposes, the determinants of those trade-offs, and how users cope when AI/apps are unavailable. Because modern AI systems (especially generative models) can produce acceptable but non-identical outputs, we define "accuracy" as context-specific reliability: the degree to which an output aligns with the user's intent within a tolerance threshold that depends on stakes and the cost of correction. In an online survey (N=300), among respondents with both accuracy items (N=170), the share requiring high accuracy (top-box) is 24.1% at work vs. 8.8% in personal life (+15.3 pp; z=6.29, p<0.001). The gap remains large under a broader top-two-box definition (67.0% vs. 32.9%) and on the full 1-5 ordinal scale (mean 3.86 vs. 3.08). Heavy app use and experience patterns correlate with stricter work standards (H2). When tools are unavailable (H3), respondents report more disruption in personal routines than at work (34.1% vs. 15.3%, p<0.01). We keep the main text focused on these substantive results and place test taxonomy and power derivations in a technical appendix.
