Table of Contents
Fetching ...

AgentEvolver: Towards Efficient Self-Evolving Agent System

Yunpeng Zhai, Shuchang Tao, Cheng Chen, Anni Zou, Ziqian Chen, Qingxu Fu, Shinji Mai, Li Yu, Jiaji Deng, Zouying Cao, Zhaoyang Liu, Bolin Ding, Jingren Zhou

TL;DR

AgentEvolver introduces a self-evolving agent framework that leverages LLMs to autonomously drive task generation, exploration, and fine-grained credit assignment, addressing data scarcity and sample inefficiency in long-horizon tool-augmented tasks. It formalizes learning in an open-ended environment by separating the sandbox E from an unknown target task distribution p_target(g) and defines proxy functions F_task and F_reward to generate training tasks and rewards. The framework comprises three synergistic mechanisms—self-questioning (curiosity-driven task generation), self-navigating (experience-guided exploration), and self-attributing (step-wise credit assignment with LLM justification)—coupled with a modular infrastructure (framework, context manager, environment service) enabling scalable, continual improvement. Empirical results on AppWorld and BFCL show substantial gains in exploration efficiency, sample utilization, and adaptation speed over PPO/GRPO baselines, with larger models deriving larger benefits, and ablations demonstrating the individual and combined value of the mechanisms. The work advances a scalable paradigm for autonomous, data-efficient evolution of agent capabilities, and outlines future directions toward larger models and LLM-level self-improvement.

Abstract

Autonomous agents powered by large language models (LLMs) have the potential to significantly enhance human productivity by reasoning, using tools, and executing complex tasks in diverse environments. However, current approaches to developing such agents remain costly and inefficient, as they typically require manually constructed task datasets and reinforcement learning (RL) pipelines with extensive random exploration. These limitations lead to prohibitively high data-construction costs, low exploration efficiency, and poor sample utilization. To address these challenges, we present AgentEvolver, a self-evolving agent system that leverages the semantic understanding and reasoning capabilities of LLMs to drive autonomous agent learning. AgentEvolver introduces three synergistic mechanisms: (i) self-questioning, which enables curiosity-driven task generation in novel environments, reducing dependence on handcrafted datasets; (ii) self-navigating, which improves exploration efficiency through experience reuse and hybrid policy guidance; and (iii) self-attributing, which enhances sample efficiency by assigning differentiated rewards to trajectory states and actions based on their contribution. By integrating these mechanisms into a unified framework, AgentEvolver enables scalable, cost-effective, and continual improvement of agent capabilities. Preliminary experiments indicate that AgentEvolver achieves more efficient exploration, better sample utilization, and faster adaptation compared to traditional RL-based baselines.

AgentEvolver: Towards Efficient Self-Evolving Agent System

TL;DR

AgentEvolver introduces a self-evolving agent framework that leverages LLMs to autonomously drive task generation, exploration, and fine-grained credit assignment, addressing data scarcity and sample inefficiency in long-horizon tool-augmented tasks. It formalizes learning in an open-ended environment by separating the sandbox E from an unknown target task distribution p_target(g) and defines proxy functions F_task and F_reward to generate training tasks and rewards. The framework comprises three synergistic mechanisms—self-questioning (curiosity-driven task generation), self-navigating (experience-guided exploration), and self-attributing (step-wise credit assignment with LLM justification)—coupled with a modular infrastructure (framework, context manager, environment service) enabling scalable, continual improvement. Empirical results on AppWorld and BFCL show substantial gains in exploration efficiency, sample utilization, and adaptation speed over PPO/GRPO baselines, with larger models deriving larger benefits, and ablations demonstrating the individual and combined value of the mechanisms. The work advances a scalable paradigm for autonomous, data-efficient evolution of agent capabilities, and outlines future directions toward larger models and LLM-level self-improvement.

Abstract

Autonomous agents powered by large language models (LLMs) have the potential to significantly enhance human productivity by reasoning, using tools, and executing complex tasks in diverse environments. However, current approaches to developing such agents remain costly and inefficient, as they typically require manually constructed task datasets and reinforcement learning (RL) pipelines with extensive random exploration. These limitations lead to prohibitively high data-construction costs, low exploration efficiency, and poor sample utilization. To address these challenges, we present AgentEvolver, a self-evolving agent system that leverages the semantic understanding and reasoning capabilities of LLMs to drive autonomous agent learning. AgentEvolver introduces three synergistic mechanisms: (i) self-questioning, which enables curiosity-driven task generation in novel environments, reducing dependence on handcrafted datasets; (ii) self-navigating, which improves exploration efficiency through experience reuse and hybrid policy guidance; and (iii) self-attributing, which enhances sample efficiency by assigning differentiated rewards to trajectory states and actions based on their contribution. By integrating these mechanisms into a unified framework, AgentEvolver enables scalable, cost-effective, and continual improvement of agent capabilities. Preliminary experiments indicate that AgentEvolver achieves more efficient exploration, better sample utilization, and faster adaptation compared to traditional RL-based baselines.

Paper Structure

This paper contains 86 sections, 21 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Performance comparison on the AppWorld and BFCL-v3 benchmarks. AgentEvolver achieves superior results while using substantially fewer parameters than larger baseline models.
  • Figure 2: Overview of the AgentEvolver framework. The self-evolving process is driven by three synergistic mechanisms: Self-questioning for autonomous task generation, Self-navigating for experience-guided exploration, and Self-attributing for fine-grained credit assignment.
  • Figure 3: The pipeline of self-questioning module, including exploration, task synthesis, and task curation. Initially, the module performs an exploratory phase across environments, which is directed by both environment profiles and user preferences (detailed in Section \ref{['sec:exploration-with-profiles']}). The generated trajectories are analyzed to formulate potential queries alongside their associated reference solutions. Next, we involve an agent that replays this path to verify the feasibility.
  • Figure 4: An example of environment profile of the sandbox in Figure \ref{['fig:overview-cstb']}. The elements in the map are represented by entities and attributes, while the operations enumerate possible conceptual actions for agents to build a fundamental understanding.
  • Figure 5: An example of experience for AppWorld, consisting of two components: When to use and Content.
  • ...and 11 more figures