Table of Contents
Fetching ...

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

Yijia Shao, Humishka Zope, Yucheng Jiang, Jiaxin Pei, David Nguyen, Erik Brynjolfsson, Diyi Yang

TL;DR

This paper tackles how AI agents will reshape work by introducing a worker-centered auditing framework that captures which tasks workers want automated or augmented and how those desires align with current capabilities. It builds WORKBank, a large-scale dataset combining 1,500 domain workers across 104 occupations with 52 AI experts evaluating 844 tasks, using an audio-enhanced survey and a new Human Agency Scale ($H1$–$H5$) to quantify human involvement. The study reveals a four-zone desire-capability landscape, highlights mismatches between worker desires and investment/tech focus, and shows that AI-agent integration may shift core human skills from information processing toward interpersonal and organizational competencies. These insights offer a concrete reference for prioritizing AI-agent R&D, guiding responsible deployment, and informing workforce development as workplace dynamics evolve. The framework advances beyond automate-or-not dichotomies by embracing augmentation and collaboration, with implications for policy, education, and industry practice.

Abstract

The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape. In this paper, we address this gap by introducing a novel auditing framework to assess which occupational tasks workers want AI agents to automate or augment, and how those desires align with the current technological capabilities. Our framework features an audio-enhanced mini-interview to capture nuanced worker desires and introduces the Human Agency Scale (HAS) as a shared language to quantify the preferred level of human involvement. Using this framework, we construct the WORKBank database, building on the U.S. Department of Labor's O*NET database, to capture preferences from 1,500 domain workers and capability assessments from AI experts across over 844 tasks spanning 104 occupations. Jointly considering the desire and technological capability divides tasks in WORKBank into four zones: Automation "Green Light" Zone, Automation "Red Light" Zone, R&D Opportunity Zone, Low Priority Zone. This highlights critical mismatches and opportunities for AI agent development. Moving beyond a simple automate-or-not dichotomy, our results reveal diverse HAS profiles across occupations, reflecting heterogeneous expectations for human involvement. Moreover, our study offers early signals of how AI agent integration may reshape the core human competencies, shifting from information-focused skills to interpersonal ones. These findings underscore the importance of aligning AI agent development with human desires and preparing workers for evolving workplace dynamics.

Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce

TL;DR

This paper tackles how AI agents will reshape work by introducing a worker-centered auditing framework that captures which tasks workers want automated or augmented and how those desires align with current capabilities. It builds WORKBank, a large-scale dataset combining 1,500 domain workers across 104 occupations with 52 AI experts evaluating 844 tasks, using an audio-enhanced survey and a new Human Agency Scale () to quantify human involvement. The study reveals a four-zone desire-capability landscape, highlights mismatches between worker desires and investment/tech focus, and shows that AI-agent integration may shift core human skills from information processing toward interpersonal and organizational competencies. These insights offer a concrete reference for prioritizing AI-agent R&D, guiding responsible deployment, and informing workforce development as workplace dynamics evolve. The framework advances beyond automate-or-not dichotomies by embracing augmentation and collaboration, with implications for policy, education, and industry practice.

Abstract

The rapid rise of compound AI systems (a.k.a., AI agents) is reshaping the labor market, raising concerns about job displacement, diminished human agency, and overreliance on automation. Yet, we lack a systematic understanding of the evolving landscape. In this paper, we address this gap by introducing a novel auditing framework to assess which occupational tasks workers want AI agents to automate or augment, and how those desires align with the current technological capabilities. Our framework features an audio-enhanced mini-interview to capture nuanced worker desires and introduces the Human Agency Scale (HAS) as a shared language to quantify the preferred level of human involvement. Using this framework, we construct the WORKBank database, building on the U.S. Department of Labor's O*NET database, to capture preferences from 1,500 domain workers and capability assessments from AI experts across over 844 tasks spanning 104 occupations. Jointly considering the desire and technological capability divides tasks in WORKBank into four zones: Automation "Green Light" Zone, Automation "Red Light" Zone, R&D Opportunity Zone, Low Priority Zone. This highlights critical mismatches and opportunities for AI agent development. Moving beyond a simple automate-or-not dichotomy, our results reveal diverse HAS profiles across occupations, reflecting heterogeneous expectations for human involvement. Moreover, our study offers early signals of how AI agent integration may reshape the core human competencies, shifting from information-focused skills to interpersonal ones. These findings underscore the importance of aligning AI agent development with human desires and preparing workers for evolving workplace dynamics.

Paper Structure

This paper contains 55 sections, 3 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Overview of the auditing framework and key insights. The framework captures dual perspectives on automation and augmentation by eliciting both worker desires and expert assessments of technological capabilities. It guides participant reasoning through structured prompts and an audio-enhanced interface. We instantiate this framework to build the WORKBank database, enabling a data-driven analysis of worker-centered needs, the desire–capability landscape, the Human Agency Scale (HAS) spectrum, and implications for core human skills.
  • Figure 2: Levels of Human Agency Scale (HAS). We introduce the Human Agency Scale (i.e., H1-H5) to quantify the team dynamics and degree of human involvement required. HAS provides a shared language to quantify automation vs. augmentation, complementing the traditionally "AI-first" perspective used in defining levels of automation. Importantly, higher HAS levels are not inherently better---different levels suit different AI roles.
  • Figure 3: Sector-level distribution of workers in the WORKBank database compared to U.S. workforce statistics from the Bureau of Labor Statistics.a, Comparison between WORKBank worker distribution and the U.S. workforce employment statistics across all sectors (sectors not included in WORKBank marked with an asterisk). b, Comparison between WORKBank worker distribution and the U.S. workforce employment statistics limited to the 104 occupations included in our database.
  • Figure 4: First-hand data from domain workers reveals positive attitudes towards AI agent automation on certain occupational tasks, particularly due to perceived benefits such as freeing up time for high-value work. However, the sentiment varies notably across sectors.a, Automation desire scores $A_w(t)$ over 844 occupational tasks, ranked based on WORKBank data, together with sector-specific breakdowns. The distribution indicates a mixed attitude, revealing high diversity of needs and preferences of workers that should be considered in AI agent R&D. b, Reported reasons for responses with $A_w(t)\geq3$. The most selected reason---"Automating the task would free up my time for high-value work"---accounts for 69.38% of the responses. c, Comparison with usage data from Claude.ai, a LLM-based chatbot (Dec 2024-Jan 2025, from handa2025economic), shows that the top 10 occupations with the highest average automation desire represent only 1.26% of total usage. This highlights the importance of directly soliciting worker input, as usage data may lag behind actual workplace needs.
  • Figure 5: Integrating worker and AI expert perspectives divides the automation landscape into four zones: Automation "Green Light" Zone, Automation "Red Light" Zone, R&D Opportunity Zone, and Low Priority Zone.a, Tasks from WORKBank are plotted in this desire-capability landscape. b, We collect Y Combinator (YC) companies and map them to tasks based on the description on their official YC detail pages using gpt-4.1-mini. The average number of YC companies per task shows no significant difference across zones, highlighting the importance of steering more investment toward the Automation "Green Light" Zone and R&D Opportunity Zone. c, We collect AI agent research papers from arXiv and evaluate their applicability to each occupational task in our database using gpt-4.1-mini. Encouragingly, the paper-task mappings are concentrated more in the R&D Opportunity Zone, though increased emphasis on this area remains desirable.
  • ...and 5 more figures