Table of Contents
Fetching ...

SEW: Self-Evolving Agentic Workflows for Automated Code Generation

Siwei Liu, Jinyuan Fang, Han Zhou, Yingxu Wang, Zaiqiao Meng

TL;DR

SEW introduces a self-evolving framework that automatically designs and optimizes multi-agent workflows and per-agent prompts for automated code generation. By employing workflow generation, evolution, and agent evolution guided by mutation operators, SEW discovers novel topologies and high-quality prompts, outperforming strong baselines across three benchmarks. The study analyzes five textual workflow representations, finding CoRE to offer the best balance between interpretability and executability, and demonstrates that both workflow and agent evolution contribute to performance gains. These results highlight SEW’s potential to reduce manual workflow design and enable adaptive, scalable agentic systems for software engineering tasks, while outlining limitations and avenues for broader applicability.

Abstract

Large Language Models (LLMs) have demonstrated effectiveness in code generation tasks. To enable LLMs to address more complex coding challenges, existing research has focused on crafting multi-agent systems with agentic workflows, where complex coding tasks are decomposed into sub-tasks, assigned to specialized agents. Despite their effectiveness, current approaches heavily rely on hand-crafted agentic workflows, with both agent topologies and prompts manually designed, which limits their ability to automatically adapt to different types of coding problems. To address these limitations and enable automated workflow design, we propose \textbf{S}elf-\textbf{E}volving \textbf{W}orkflow (\textbf{SEW}), a novel self-evolving framework that automatically generates and optimises multi-agent workflows. Extensive experiments on three coding benchmark datasets, including the challenging LiveCodeBench, demonstrate that our SEW can automatically design agentic workflows and optimise them through self-evolution, bringing up to 33\% improvement on LiveCodeBench compared to using the backbone LLM only. Furthermore, by investigating different representation schemes of workflow, we provide insights into the optimal way to encode workflow information with text.

SEW: Self-Evolving Agentic Workflows for Automated Code Generation

TL;DR

SEW introduces a self-evolving framework that automatically designs and optimizes multi-agent workflows and per-agent prompts for automated code generation. By employing workflow generation, evolution, and agent evolution guided by mutation operators, SEW discovers novel topologies and high-quality prompts, outperforming strong baselines across three benchmarks. The study analyzes five textual workflow representations, finding CoRE to offer the best balance between interpretability and executability, and demonstrates that both workflow and agent evolution contribute to performance gains. These results highlight SEW’s potential to reduce manual workflow design and enable adaptive, scalable agentic systems for software engineering tasks, while outlining limitations and avenues for broader applicability.

Abstract

Large Language Models (LLMs) have demonstrated effectiveness in code generation tasks. To enable LLMs to address more complex coding challenges, existing research has focused on crafting multi-agent systems with agentic workflows, where complex coding tasks are decomposed into sub-tasks, assigned to specialized agents. Despite their effectiveness, current approaches heavily rely on hand-crafted agentic workflows, with both agent topologies and prompts manually designed, which limits their ability to automatically adapt to different types of coding problems. To address these limitations and enable automated workflow design, we propose \textbf{S}elf-\textbf{E}volving \textbf{W}orkflow (\textbf{SEW}), a novel self-evolving framework that automatically generates and optimises multi-agent workflows. Extensive experiments on three coding benchmark datasets, including the challenging LiveCodeBench, demonstrate that our SEW can automatically design agentic workflows and optimise them through self-evolution, bringing up to 33\% improvement on LiveCodeBench compared to using the backbone LLM only. Furthermore, by investigating different representation schemes of workflow, we provide insights into the optimal way to encode workflow information with text.

Paper Structure

This paper contains 20 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of agent and workflow evolution in code generation. The initialized setup (left) includes agents with naive prompts while the evolved setup (right) is equipped with enhanced prompts generated by SEW and a more sophisticated workflow structure.
  • Figure 2: The overall framework of SEW. The process begins with workflow generation, followed by workflow evolution. Then each agent within the evolved workflow will be equipped with enhanced prompts generated by the agent evolution module. Such an agent evolution module is driven by the Direct Evolution (DE) operator and Hyper Evolution (HE) operator, leveraging LLMs, where we use a mutation prompt $\mathcal{T}_{mut}$ or a hyper-mutation prompt $\mathcal{T}_{hmut}$ to enhance the prompt of an agent.
  • Figure 3: Illustration of the Direct Evolution and Hyper Evolution of SEW. We use green, yellow and blue boxes to indicate the evolutionary prompt, default agent prompt and textual output of evolutionary operators.
  • Figure 4: A workflow represented by the BPMN and the CoRE schemes, respectively.
  • Figure 5: Performance comparison of Code Rewriting and Task Parsing Workflows under different agent evolution strategies on the LCB dataset.