Table of Contents
Fetching ...

Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation

Weiming Zhang, Jihong Wang, Jiamu Zhou, Qingyao Li, Xinbei Ma, Congmin Zheng, Xingyu Lou, Weiwen Liu, Zhuosheng Zhang, Jun Wang, Yong Yu, Weinan Zhang

TL;DR

Plan-MCTS introduces a plan-space Monte Carlo Tree Search framework for autonomous web navigation to address sparse valid action paths and noisy execution histories. By decoupling strategic planning from execution grounding, it builds a Dense Plan Tree and maintains an Abstracted Semantic History, guided by a Dual-Gating Reward and on-policy Structural Refinement to repair failed subplans. Empirical results on the WebArena benchmark show Plan-MCTS achieving state-of-the-art task success and improved search efficiency compared to action-space baselines, with better scalability as compute budgets grow. The work demonstrates that high-level semantic planning, coupled with robust grounding and verification, significantly enhances long-horizon web navigation performance and reliability.

Abstract

Large Language Models (LLMs) have empowered autonomous agents to handle complex web navigation tasks. While recent studies integrate tree search to enhance long-horizon reasoning, applying these algorithms in web navigation faces two critical challenges: sparse valid paths that lead to inefficient exploration, and a noisy context that dilutes accurate state perception. To address this, we introduce Plan-MCTS, a framework that reformulates web navigation by shifting exploration to a semantic Plan Space. By decoupling strategic planning from execution grounding, it transforms sparse action space into a Dense Plan Tree for efficient exploration, and distills noisy contexts into an Abstracted Semantic History for precise state awareness. To ensure efficiency and robustness, Plan-MCTS incorporates a Dual-Gating Reward to strictly validate both physical executability and strategic alignment and Structural Refinement for on-policy repair of failed subplans. Extensive experiments on WebArena demonstrate that Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.

Plan-MCTS: Plan Exploration for Action Exploitation in Web Navigation

TL;DR

Plan-MCTS introduces a plan-space Monte Carlo Tree Search framework for autonomous web navigation to address sparse valid action paths and noisy execution histories. By decoupling strategic planning from execution grounding, it builds a Dense Plan Tree and maintains an Abstracted Semantic History, guided by a Dual-Gating Reward and on-policy Structural Refinement to repair failed subplans. Empirical results on the WebArena benchmark show Plan-MCTS achieving state-of-the-art task success and improved search efficiency compared to action-space baselines, with better scalability as compute budgets grow. The work demonstrates that high-level semantic planning, coupled with robust grounding and verification, significantly enhances long-horizon web navigation performance and reliability.

Abstract

Large Language Models (LLMs) have empowered autonomous agents to handle complex web navigation tasks. While recent studies integrate tree search to enhance long-horizon reasoning, applying these algorithms in web navigation faces two critical challenges: sparse valid paths that lead to inefficient exploration, and a noisy context that dilutes accurate state perception. To address this, we introduce Plan-MCTS, a framework that reformulates web navigation by shifting exploration to a semantic Plan Space. By decoupling strategic planning from execution grounding, it transforms sparse action space into a Dense Plan Tree for efficient exploration, and distills noisy contexts into an Abstracted Semantic History for precise state awareness. To ensure efficiency and robustness, Plan-MCTS incorporates a Dual-Gating Reward to strictly validate both physical executability and strategic alignment and Structural Refinement for on-policy repair of failed subplans. Extensive experiments on WebArena demonstrate that Plan-MCTS achieves state-of-the-art performance, surpassing current approaches with higher task effectiveness and search efficiency.
Paper Structure (46 sections, 6 equations, 10 figures, 3 tables)

This paper contains 46 sections, 6 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Comparison between Action Space Search and Plan Space Search in web navigation.
  • Figure 2: The overall framework of Plan-MCTS. We employ MCTS to explore high-level plans for strategic reasoning, while utilizing an Operator to exploit these plans through atomic action grounding.
  • Figure 3: Efficiency comparison between Action-Space and Plan-Space search.
  • Figure 4: Performance comparison between Plan-MCTS and action-level search when we scale the search budget.
  • Figure 5: Performance comparison in terms of different context components.
  • ...and 5 more figures