Unveiling Disparities in Web Task Handling Between Human and Web Agent
Kihoon Son, Jinhyeon Kwon, DaEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, Juho Kim
TL;DR
The paper examines how humans and web agents perform web tasks, emphasizing planning, action, and reflection through a think-aloud study with four participants across two tasks (Amazon shopping and Reddit search). The authors identify distinct cognitive actions and on-site operations in humans and compare them to agent architectures, revealing gaps in knowledge updating and ambiguity handling. Humans continuously discover task- and site-related knowledge through UI understanding and trial-and-error, expand their knowledge space, and engage in retrospective reflection to revise plans, while current web agents rely more on predefined sub-plans and evident feedback. These findings motivate designing hybrid agent architectures with explicit support for task/site knowledge updating, information discovery, reflection, rollback, and tacit knowledge capture, enabling more human-like adaptability in web environments.
Abstract
With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizability of these agents. This study investigates the disparities between human and web agents' performance in web tasks (e.g., information search) by concentrating on planning, action, and reflection aspects during task execution. We conducted a web task study with a think-aloud protocol, revealing distinct cognitive actions and operations on websites employed by humans. Comparative examination of existing agent structures and human behavior with thought processes highlighted differences in knowledge updating and ambiguity handling when performing the task. Humans demonstrated a propensity for exploring and modifying plans based on additional information and investigating reasons for failure. These findings offer insights into designing planning, reflection, and information discovery modules for web agents and designing the capturing method for implicit human knowledge in a web task.
