TodoEvolve: Learning to Architect Agent Planning Systems
Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin, Qibing Ren, Junchi Yan
TL;DR
TodoEvolve addresses the rigidity of fixed planning structures by introducing a meta-planning framework that automatically synthesizes task-adaptive planning architectures. The PlanFactory codebase provides a unified four-dimension design space (Topology, Initialization, Adaptation, Navigation) to host diverse planning paradigms, while Todo-14B, trained with Impedance-Guided Preference Optimization (IGPO), learns to instantiate and revise these architectures for each task. Through a two-stage curriculum—structural competence via SFT followed by impedance-aware alignment—TodoEvolve achieves robust cross-domain performance, consistently surpassing carefully engineered planners on five agentic benchmarks with favorable efficiency trade-offs. The work demonstrates that dynamically architectural planning, guided by verifiable execution signals, can significantly enhance long-horizon reasoning in open-ended environments and generalize across multiple backbones.
Abstract
Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.
