Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Peng Xia, Kaide Zeng, Jiaqi Liu, Can Qin, Fang Wu, Yiyang Zhou, Caiming Xiong, Huaxiu Yao
TL;DR
Agent0 tackles the data bottleneck in LLM agents by enabling fully autonomous, zero-data co-evolution between a Curriculum Agent and an Executor Agent, enhanced with a code interpreter tool. The curriculum generates frontier tasks while the executor solves them, forming a self-reinforcing loop that continually raises task difficulty as capabilities improve. Key innovations include GRPO-based curriculum optimization, Ambiguity-Dynamic Policy Optimization to handle pseudo-label noise, and multi-turn tool-augmented reasoning that expands problem-solving horizons. Empirically, Agent0 substantially boosts reasoning on Qwen3-8B-Base (18% math, 24% general) and demonstrates strong generalization, highlighting a scalable pathway for self-improving, tool-enabled agents without external data.
Abstract
Large Language Model (LLM) Agents, often trained with Reinforcement Learning (RL), are constrained by a dependency on human-curated data, limiting scalability and tethering AI to human knowledge. Existing self-evolution frameworks offer an alternative but are typically restricted by the model's inherent capabilities and single-round interactions, hindering the development of complex curricula involving tool use or dynamic reasoning. We introduce Agent0, a fully autonomous framework that evolves high-performing agents without external data through multi-step co-evolution and seamless tool integration. Agent0 establishes a symbiotic competition between two agents initialized from the same base LLM: a curriculum agent that proposes increasingly challenging frontier tasks, and an executor agent that learns to solve them. We integrate external tools to enhance the executor's problem-solving capacity; this improvement, in turn, pressures the curriculum agent to construct more complex, tool-aware tasks. Through this iterative process, Agent0 establishes a self-reinforcing cycle that continuously produces high-quality curricula. Empirically, Agent0 substantially boosts reasoning capabilities, improving the Qwen3-8B-Base model by 18% on mathematical reasoning and 24% on general reasoning benchmarks. Code is available at https://github.com/aiming-lab/Agent0.
