Table of Contents
Fetching ...

A Self-Improving Coding Agent

Maxime Robeyns, Martin Szummer, Laurence Aitchison

TL;DR

The paper tackles enabling LLM-powered agents to autonomously improve by editing their own code, introducing SICA, a fully self-referential coding agent implemented in Python. SICA iteratively edits its codebase guided by a benchmark-archive and a utility function, augmented by an asynchronous overseer to ensure safety and progress. Empirically, the approach yields substantial gains (17–53%) on SWE-Bench Verified and gains on LiveCodeBench and synthetic tasks, demonstrating a data-efficient, non-gradient-based self-improvement loop. This work provides a practical framework for automatic agent design and opens avenues for jointly training foundation models with their agent systems in the future.

Abstract

Recent advancements in Large Language Models (LLMs) have spurred interest in deploying LLM agents to undertake tasks in the world. LLMs are often deployed in agent systems: code that orchestrates LLM calls and provides them with tools. We demonstrate that an agent system, equipped with basic coding tools, can autonomously edit itself, and thereby improve its performance on benchmark tasks. We find performance gains from 17% to 53% on a random subset of SWE Bench Verified, with additional performance gains on LiveCodeBench, as well as synthetically generated agent benchmarks. Our work represents an advancement in the automated and open-ended design of agentic systems, and demonstrates a data-efficient, non gradient-based learning mechanism driven by LLM reflection and code updates.

A Self-Improving Coding Agent

TL;DR

The paper tackles enabling LLM-powered agents to autonomously improve by editing their own code, introducing SICA, a fully self-referential coding agent implemented in Python. SICA iteratively edits its codebase guided by a benchmark-archive and a utility function, augmented by an asynchronous overseer to ensure safety and progress. Empirically, the approach yields substantial gains (17–53%) on SWE-Bench Verified and gains on LiveCodeBench and synthetic tasks, demonstrating a data-efficient, non-gradient-based self-improvement loop. This work provides a practical framework for automatic agent design and opens avenues for jointly training foundation models with their agent systems in the future.

Abstract

Recent advancements in Large Language Models (LLMs) have spurred interest in deploying LLM agents to undertake tasks in the world. LLMs are often deployed in agent systems: code that orchestrates LLM calls and provides them with tools. We demonstrate that an agent system, equipped with basic coding tools, can autonomously edit itself, and thereby improve its performance on benchmark tasks. We find performance gains from 17% to 53% on a random subset of SWE Bench Verified, with additional performance gains on LiveCodeBench, as well as synthetically generated agent benchmarks. Our work represents an advancement in the automated and open-ended design of agentic systems, and demonstrates a data-efficient, non gradient-based learning mechanism driven by LLM reflection and code updates.

Paper Structure

This paper contains 16 sections, 2 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Meta Agent Loop: the agents starts with the minimal code required to support initial self-improvement, and then follows a sequence of benchmarking and meta-improvement.
  • Figure 2: LLM context window structure.
  • Figure 3: Performance across iterations. Key improvements are annotated with their corresponding tool or agent modifications.
  • Figure 4: Agent Framework Saturation: the benefits the agent system was able to find when the models alone (e.g. o3-mini-high) already perform well was marginal.