Table of Contents
Fetching ...

TodoEvolve: Learning to Architect Agent Planning Systems

Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin, Qibing Ren, Junchi Yan

TL;DR

TodoEvolve addresses the rigidity of fixed planning structures by introducing a meta-planning framework that automatically synthesizes task-adaptive planning architectures. The PlanFactory codebase provides a unified four-dimension design space (Topology, Initialization, Adaptation, Navigation) to host diverse planning paradigms, while Todo-14B, trained with Impedance-Guided Preference Optimization (IGPO), learns to instantiate and revise these architectures for each task. Through a two-stage curriculum—structural competence via SFT followed by impedance-aware alignment—TodoEvolve achieves robust cross-domain performance, consistently surpassing carefully engineered planners on five agentic benchmarks with favorable efficiency trade-offs. The work demonstrates that dynamically architectural planning, guided by verifiable execution signals, can significantly enhance long-horizon reasoning in open-ended environments and generalize across multiple backbones.

Abstract

Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.

TodoEvolve: Learning to Architect Agent Planning Systems

TL;DR

TodoEvolve addresses the rigidity of fixed planning structures by introducing a meta-planning framework that automatically synthesizes task-adaptive planning architectures. The PlanFactory codebase provides a unified four-dimension design space (Topology, Initialization, Adaptation, Navigation) to host diverse planning paradigms, while Todo-14B, trained with Impedance-Guided Preference Optimization (IGPO), learns to instantiate and revise these architectures for each task. Through a two-stage curriculum—structural competence via SFT followed by impedance-aware alignment—TodoEvolve achieves robust cross-domain performance, consistently surpassing carefully engineered planners on five agentic benchmarks with favorable efficiency trade-offs. The work demonstrates that dynamically architectural planning, guided by verifiable execution signals, can significantly enhance long-horizon reasoning in open-ended environments and generalize across multiple backbones.

Abstract

Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.
Paper Structure (35 sections, 5 equations, 7 figures, 4 tables)

This paper contains 35 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: The overall inference workflow of TodoEvolve first constructs a customized planning system along four dimensions—topology, initialization, adaptation, and navigation, and then deploys it in real time to orchestrate agent execution.
  • Figure 2: Task-Dependent Performance Variability.
  • Figure 3: Ablation Analysis on GAIA Level 2. We compare the following variants, BS (Base Model), SFT (SFT-Only), ZS (Zero-Shot) and TodoEvolve.
  • Figure 4: Evolved planning architectures in real-world instantiation. The system provides adaptive, state-aware structural scaffolding that spans from macro-topology initialization to granular adaptation and navigation during the execution stage, effectively steering the agent toward robust and resilient inference.
  • Figure 5: Linear Sequential Planning for Multi-Criteria Filtering. For a query requiring strict multi-stage filtering and calculation (identifying countries based on migration thresholds followed by crime index analysis), TodoEvolve instantiates a linear execution topology. The system prioritizes a sequential "fetch-and-filter" pipeline to manage data dependencies, incorporating a periodic adaptation trigger to validate intermediate retrieval results before proceeding to the final synthesis and verification stage. This structure minimizes branching overhead for tasks where step-wise logical progression is paramount.
  • ...and 2 more figures