WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks
Jingbo Yang, Bairu Hou, Wei Wei, Shiyu Chang, Yujia Bao
TL;DR
WebDART addresses the difficulty of complex web tasks that require long-horizon navigation and multi-step reasoning by introducing a three-subtask decomposition (navigation, information extraction, execution) with dynamic replanning. The framework enables a frozen LLM to focus on one capability at a time and adapt plans as new page features appear, improving efficiency and accuracy without additional training. Across WebChoreArena and WebArena benchmarks, WebDART achieves state-of-the-art performance on complex tasks and maintains strong results on simpler navigation tasks, with notable reductions in navigation steps. The approach also demonstrates robust gains across multiple backbone models, highlighting its practical impact for scalable, adaptive web automation.
Abstract
Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information extraction, and reasoning under constraints. We present WebDART, a general framework that enables a single LLM to handle such complex chores. WebDART (i) dynamically decomposes each objective into three focused subtasks: navigation, information extraction, and execution, so the model concentrates on one skill at a time, and (ii) continuously replans the decomposition as new webpages are revealed, taking advantage of newly discovered filters or shortcuts and avoiding redundant exploration. Evaluated on WebChoreArena, WebDART lifts success rates by up to 13.7 percentage points over previous SOTA agents, while matching their performance on the easier WebArena suite and completing tasks with up to 14.7 fewer navigation steps.
