Table of Contents
Fetching ...

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

Jingbo Yang, Bairu Hou, Wei Wei, Shiyu Chang, Yujia Bao

TL;DR

WebDART addresses the difficulty of complex web tasks that require long-horizon navigation and multi-step reasoning by introducing a three-subtask decomposition (navigation, information extraction, execution) with dynamic replanning. The framework enables a frozen LLM to focus on one capability at a time and adapt plans as new page features appear, improving efficiency and accuracy without additional training. Across WebChoreArena and WebArena benchmarks, WebDART achieves state-of-the-art performance on complex tasks and maintains strong results on simpler navigation tasks, with notable reductions in navigation steps. The approach also demonstrates robust gains across multiple backbone models, highlighting its practical impact for scalable, adaptive web automation.

Abstract

Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information extraction, and reasoning under constraints. We present WebDART, a general framework that enables a single LLM to handle such complex chores. WebDART (i) dynamically decomposes each objective into three focused subtasks: navigation, information extraction, and execution, so the model concentrates on one skill at a time, and (ii) continuously replans the decomposition as new webpages are revealed, taking advantage of newly discovered filters or shortcuts and avoiding redundant exploration. Evaluated on WebChoreArena, WebDART lifts success rates by up to 13.7 percentage points over previous SOTA agents, while matching their performance on the easier WebArena suite and completing tasks with up to 14.7 fewer navigation steps.

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

TL;DR

WebDART addresses the difficulty of complex web tasks that require long-horizon navigation and multi-step reasoning by introducing a three-subtask decomposition (navigation, information extraction, execution) with dynamic replanning. The framework enables a frozen LLM to focus on one capability at a time and adapt plans as new page features appear, improving efficiency and accuracy without additional training. Across WebChoreArena and WebArena benchmarks, WebDART achieves state-of-the-art performance on complex tasks and maintains strong results on simpler navigation tasks, with notable reductions in navigation steps. The approach also demonstrates robust gains across multiple backbone models, highlighting its practical impact for scalable, adaptive web automation.

Abstract

Large language model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long horizon navigation, large scale information extraction, and reasoning under constraints. We present WebDART, a general framework that enables a single LLM to handle such complex chores. WebDART (i) dynamically decomposes each objective into three focused subtasks: navigation, information extraction, and execution, so the model concentrates on one skill at a time, and (ii) continuously replans the decomposition as new webpages are revealed, taking advantage of newly discovered filters or shortcuts and avoiding redundant exploration. Evaluated on WebChoreArena, WebDART lifts success rates by up to 13.7 percentage points over previous SOTA agents, while matching their performance on the easier WebArena suite and completing tasks with up to 14.7 fewer navigation steps.

Paper Structure

This paper contains 36 sections, 1 equation, 3 figures, 4 tables.

Figures (3)

  • Figure 1: (Top) Existing LLM-based web agents perform well on simple tasks, but their success rates drop on complex tasks that require non-trivial reasoning, such as applying a price-range filter (bottom left).WebDART overcomes this limitation by dynamically decomposing the objective into three subtasks: navigation, information extraction, and execution. (Bottom right) Consequently,WebDART significantly outperforms the current state of the art on WebChoreArena across all task categories. Backbone LLM: GPT-5.
  • Figure 2: Overview of the WebDART framework. A complex web task is dynamically decomposed into three sequential subtasks. (1) Navigation: the agent explores the site—issuing actions such as click, type, and go_back—to gather every page that could contain the required information. (2) Information extraction: given these pages, a dedicated module isolates task-relevant content and converts it into a standardised, structured form based on the objective. (3) Execution: the extracted data are analysed to meet the task constraints, e.g., by generating and running Python code on the fly to perform filtering, aggregation, or other computations.
  • Figure 3: Illustration of the WebDART framework in navigation. An initial plan is generated before starting navigation. The navigation agent issues an action at each step. When new web elements (e.g., filters, sorting options) appear, the dynamic re-planning module updates the decomposition and plan, enabling the agent to adapt its strategy for more efficient execution.