A Self-Improving Coding Agent
Maxime Robeyns, Martin Szummer, Laurence Aitchison
TL;DR
The paper tackles enabling LLM-powered agents to autonomously improve by editing their own code, introducing SICA, a fully self-referential coding agent implemented in Python. SICA iteratively edits its codebase guided by a benchmark-archive and a utility function, augmented by an asynchronous overseer to ensure safety and progress. Empirically, the approach yields substantial gains (17–53%) on SWE-Bench Verified and gains on LiveCodeBench and synthetic tasks, demonstrating a data-efficient, non-gradient-based self-improvement loop. This work provides a practical framework for automatic agent design and opens avenues for jointly training foundation models with their agent systems in the future.
Abstract
Recent advancements in Large Language Models (LLMs) have spurred interest in deploying LLM agents to undertake tasks in the world. LLMs are often deployed in agent systems: code that orchestrates LLM calls and provides them with tools. We demonstrate that an agent system, equipped with basic coding tools, can autonomously edit itself, and thereby improve its performance on benchmark tasks. We find performance gains from 17% to 53% on a random subset of SWE Bench Verified, with additional performance gains on LiveCodeBench, as well as synthetically generated agent benchmarks. Our work represents an advancement in the automated and open-ended design of agentic systems, and demonstrates a data-efficient, non gradient-based learning mechanism driven by LLM reflection and code updates.
