Table of Contents
Fetching ...

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks

Shuzheng Si, Haozhe Zhao, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun

TL;DR

This work tackles thePlan-and-execute planning gap in long-horizon LLM-based agents by introducing EAGLET, a plug-and-play global planner trained without manual labor. The approach first synthesizes high-quality global plans with an advanced reasoning LLM and filters them via Homologous Consensus Filtering, then cold-starts the planner with supervised fine-tuning. It further refines the planner through a rule-based RL stage using Executor Capability Gain Reward and optimizes plans with GRPO, achieving strong generalization across tasks and executors. Empirical results on ScienceWorld, ALFWorld, and WebShop show state-of-the-art performance and an 8x reduction in training cost, highlighting the method’s efficiency and scalability for long-horizon planning. The work advances explicit global planning in LLM agents, enabling robust, plan-informed execution without manual data curation.

Abstract

Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent's planning abilities without human effort. Specifically, we train a plug-and-play global planner through a two-step process: we first synthesize high-quality plans from an advanced LLM using our proposed homologous consensus filtering strategy, and apply fine-tuning as a cold start. Moreover, we further improve the planner with a rule-based reinforcement learning stage using a novel executor capability gain reward, ensuring it can handle task instructions of varying difficulty. Experiments on three long-horizon agent tasks show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance. Meanwhile, EAGLET reduces training costs by 8x compared to RL-based baselines, and it does not require manual effort or extra training data, offering an efficient and effective solution.

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks

TL;DR

This work tackles thePlan-and-execute planning gap in long-horizon LLM-based agents by introducing EAGLET, a plug-and-play global planner trained without manual labor. The approach first synthesizes high-quality global plans with an advanced reasoning LLM and filters them via Homologous Consensus Filtering, then cold-starts the planner with supervised fine-tuning. It further refines the planner through a rule-based RL stage using Executor Capability Gain Reward and optimizes plans with GRPO, achieving strong generalization across tasks and executors. Empirical results on ScienceWorld, ALFWorld, and WebShop show state-of-the-art performance and an 8x reduction in training cost, highlighting the method’s efficiency and scalability for long-horizon planning. The work advances explicit global planning in LLM agents, enabling robust, plan-informed execution without manual data curation.

Abstract

Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent's planning abilities without human effort. Specifically, we train a plug-and-play global planner through a two-step process: we first synthesize high-quality plans from an advanced LLM using our proposed homologous consensus filtering strategy, and apply fine-tuning as a cold start. Moreover, we further improve the planner with a rule-based reinforcement learning stage using a novel executor capability gain reward, ensuring it can handle task instructions of varying difficulty. Experiments on three long-horizon agent tasks show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance. Meanwhile, EAGLET reduces training costs by 8x compared to RL-based baselines, and it does not require manual effort or extra training data, offering an efficient and effective solution.

Paper Structure

This paper contains 21 sections, 9 equations, 15 figures, 10 tables.

Figures (15)

  • Figure 1: Traditional agent planning vs. Agent planning with our planner EAGLET. In this way, executor agents can complete tasks better within fewer interactions.
  • Figure 2: EAGLET vs. previous methods: we introduce a plug-and-play, efficient, and effective global planner to provide explicit guidance to mitigate planning hallucinations without human effort.
  • Figure 3: The Overall Process of EAGLET, including (1) Cold-Start SFT: We synthesize high-quality global plans using the homologous consensus filtering method for the SFT stage. (2) RL Training: We further refine the planner using a rule-based RL approach with the designed executor capability gain reward.
  • Figure 4: Plan Quality Analysis. The comparison of GPT-4.1-generated and our plans on ALFWorld.
  • Figure 5: Case study from ALFWorld.
  • ...and 10 more figures