Table of Contents
Fetching ...

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

ELita Lobo, Xu Chen, Jingjing Meng, Nan Xi, Yang Jiao, Chirag Agarwal, Yair Zick, Yan Gao

TL;DR

STRUCTUREDAGENT is proposed, a hierarchical planning framework with two core components: an online hierarchical planner that uses dynamic AND/OR trees for efficient search and a structured memory module that tracks and maintains candidate solutions to improve constraint satisfaction in information-seeking tasks.

Abstract

Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take actions that optimize long-term objectives. However, existing web agents struggle on complex, long-horizon tasks due to limited in-context memory for tracking history, weak planning abilities, and greedy behaviors that lead to premature termination. To address these challenges, we propose STRUCTUREDAGENT, a hierarchical planning framework with two core components: (1) an online hierarchical planner that uses dynamic AND/OR trees for efficient search and (2) a structured memory module that tracks and maintains candidate solutions to improve constraint satisfaction in information-seeking tasks. The framework also produces interpretable hierarchical plans, enabling easier debugging and facilitating human intervention when needed. Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.

STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

TL;DR

STRUCTUREDAGENT is proposed, a hierarchical planning framework with two core components: an online hierarchical planner that uses dynamic AND/OR trees for efficient search and a structured memory module that tracks and maintains candidate solutions to improve constraint satisfaction in information-seeking tasks.

Abstract

Recent advances in large language models (LLMs) have enabled agentic systems for sequential decision-making. Such agents must perceive their environment, reason across multiple time steps, and take actions that optimize long-term objectives. However, existing web agents struggle on complex, long-horizon tasks due to limited in-context memory for tracking history, weak planning abilities, and greedy behaviors that lead to premature termination. To address these challenges, we propose STRUCTUREDAGENT, a hierarchical planning framework with two core components: (1) an online hierarchical planner that uses dynamic AND/OR trees for efficient search and (2) a structured memory module that tracks and maintains candidate solutions to improve constraint satisfaction in information-seeking tasks. The framework also produces interpretable hierarchical plans, enabling easier debugging and facilitating human intervention when needed. Our results on WebVoyager, WebArena, and custom shopping benchmarks show that STRUCTUREDAGENT improves performance on long-horizon web-browsing tasks compared to standard LLM-based agents.
Paper Structure (22 sections, 14 figures, 21 tables, 10 algorithms)

This paper contains 22 sections, 14 figures, 21 tables, 10 algorithms.

Figures (14)

  • Figure 1: Illustration of StructuredAgent solving a web task via greedy DFS of a dynamically constructed ${\textsc{And/Or}}$ tree. The root node represents the task objective and is expanded into subtasks that are progressively refined and executed. Node types are color-coded to distinguish OR ($\vee$), AND ($\wedge$), and ACTION nodes. The top half tracks the corresponding DFS stack states, where each node enters the stack in one of three states: ENTERING (ENT), EXITING (EXT), or FAILED (FAIL). Node repair and pruning are supported but not triggered in this example, and human intervention is optional.
  • Figure 2: Distribution of average HTML observation lengths (in tokens) per timestep across agent trajectories for WebVoyager and WebArena. Token counts reflect only raw web page content, excluding any additional context provided to the agent.
  • Figure 3: Illustration of the StructuredAgent Framework.
  • Figure 4: Illustration of node state transitions during iterative modified greedy depth-first search on AND-OR tree. Unlike traditional depth-first search, nodes may be revisited and repeatedly transition through Entering, Exiting, and Failed states.
  • Figure 5: Overview of the primary operations used to construct and maintain the And/Or tree structure.
  • ...and 9 more figures