Table of Contents
Fetching ...

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, Daxin Jiang, Binxing Jiao, Chen Hu, Huacan Wang

TL;DR

SE-Agent introduces a trajectory-level self-evolution framework for LLM-based agents, enabling iterative improvement of multi-step reasoning through revision, recombination, and refinement. Starting from a diverse pool of pilot trajectories generated via multi-planning and mutation, the framework propagates cross-trajectory knowledge to escape local optima and enhance reasoning quality. Evaluated on SWE-bench Verified across five LLMs, SE-Agent achieves significant improvements over strong baselines and demonstrates robust gains in real-world code repair tasks. The work offers a practical, open-source approach to robust, adaptable reasoning in complex software engineering environments, with potential extensions to broader path-search problems and embodied AI.

Abstract

Large Language Model (LLM)-based agents have recently shown impressive capabilities in complex reasoning and tool use via multi-step interactions with their environments. While these agents have the potential to tackle complicated tasks, their problem-solving process, i.e., agents' interaction trajectory leading to task completion, remains underexploited. These trajectories contain rich feedback that can navigate agents toward the right directions for solving problems correctly. Although prevailing approaches, such as Monte Carlo Tree Search (MCTS), can effectively balance exploration and exploitation, they ignore the interdependence among various trajectories and lack the diversity of search spaces, which leads to redundant reasoning and suboptimal outcomes. To address these challenges, we propose SE-Agent, a Self-Evolution framework that enables Agents to optimize their reasoning processes iteratively. Our approach revisits and enhances former pilot trajectories through three key operations: revision, recombination, and refinement. This evolutionary mechanism enables two critical advantages: (1) it expands the search space beyond local optima by intelligently exploring diverse solution paths guided by previous trajectories, and (2) it leverages cross-trajectory inspiration to efficiently enhance performance while mitigating the impact of suboptimal reasoning paths. Through these mechanisms, SE-Agent achieves continuous self-evolution that incrementally improves reasoning quality. We evaluate SE-Agent on SWE-bench Verified to resolve real-world GitHub issues. Experimental results across five strong LLMs show that integrating SE-Agent delivers up to 55% relative improvement, achieving state-of-the-art performance among all open-source agents on SWE-bench Verified. Our code and demonstration materials are publicly available at https://github.com/JARVIS-Xs/SE-Agent.

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

TL;DR

SE-Agent introduces a trajectory-level self-evolution framework for LLM-based agents, enabling iterative improvement of multi-step reasoning through revision, recombination, and refinement. Starting from a diverse pool of pilot trajectories generated via multi-planning and mutation, the framework propagates cross-trajectory knowledge to escape local optima and enhance reasoning quality. Evaluated on SWE-bench Verified across five LLMs, SE-Agent achieves significant improvements over strong baselines and demonstrates robust gains in real-world code repair tasks. The work offers a practical, open-source approach to robust, adaptable reasoning in complex software engineering environments, with potential extensions to broader path-search problems and embodied AI.

Abstract

Large Language Model (LLM)-based agents have recently shown impressive capabilities in complex reasoning and tool use via multi-step interactions with their environments. While these agents have the potential to tackle complicated tasks, their problem-solving process, i.e., agents' interaction trajectory leading to task completion, remains underexploited. These trajectories contain rich feedback that can navigate agents toward the right directions for solving problems correctly. Although prevailing approaches, such as Monte Carlo Tree Search (MCTS), can effectively balance exploration and exploitation, they ignore the interdependence among various trajectories and lack the diversity of search spaces, which leads to redundant reasoning and suboptimal outcomes. To address these challenges, we propose SE-Agent, a Self-Evolution framework that enables Agents to optimize their reasoning processes iteratively. Our approach revisits and enhances former pilot trajectories through three key operations: revision, recombination, and refinement. This evolutionary mechanism enables two critical advantages: (1) it expands the search space beyond local optima by intelligently exploring diverse solution paths guided by previous trajectories, and (2) it leverages cross-trajectory inspiration to efficiently enhance performance while mitigating the impact of suboptimal reasoning paths. Through these mechanisms, SE-Agent achieves continuous self-evolution that incrementally improves reasoning quality. We evaluate SE-Agent on SWE-bench Verified to resolve real-world GitHub issues. Experimental results across five strong LLMs show that integrating SE-Agent delivers up to 55% relative improvement, achieving state-of-the-art performance among all open-source agents on SWE-bench Verified. Our code and demonstration materials are publicly available at https://github.com/JARVIS-Xs/SE-Agent.

Paper Structure

This paper contains 41 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview of our proposed SE-Agent self-evolution framework. Starting from an initial pool of diverse pilot trajectories, the agent iteratively performs three trajectory-level operators—Revision, Recombination, and Refinement—to harvest cross-trajectory insights, escape local optima, and converge to a high-reward solution path that robustly solves the target task.
  • Figure 2: Ablation study of SE-Agent on SWE-bench Verified with three variants.
  • Figure 3: Venn diagram of resolved issues on SWE-bench Verified.
  • Figure 4: Performance of SE-Agent at different numbers of candidate trajectories (left) and its comparison with SWE-Agent and SWE-Search under different maximum API costs (right).
  • Figure 5: A complete case study demonstrating how SE-Agent progressively optimizes trajectories through its three core operations.
  • ...and 3 more figures