From Language Models to Practical Self-Improving Computer Agents
Alex Sheng
TL;DR
The paper presents a practical workflow for building self-improving AI agents that autonomously generate augmentations to solve diverse computer tasks. It uses a minimal prompt loop and a lightweight execution environment to produce tools from file viewing/editing to retrieval and internet navigation. The approach demonstrates how prompt engineering and tool generation enable progressive capability expansion, though experiments are illustrative rather than rigorous. The work highlights significant practical potential and points to important security and ethical considerations for deploying self-improving agents.
Abstract
We develop a simple and straightforward methodology to create AI computer agents that can carry out diverse computer tasks and self-improve by developing tools and augmentations to enable themselves to solve increasingly complex tasks. As large language models (LLMs) have been shown to benefit from non-parametric augmentations, a significant body of recent work has focused on developing software that augments LLMs with various capabilities. Rather than manually developing static software to augment LLMs through human engineering effort, we propose that an LLM agent can systematically generate software to augment itself. We show, through a few case studies, that a minimal querying loop with appropriate prompt engineering allows an LLM to generate and use various augmentations, freely extending its own capabilities to carry out real-world computer tasks. Starting with only terminal access, we prompt an LLM agent to augment itself with retrieval, internet search, web navigation, and text editor capabilities. The agent effectively uses these various tools to solve problems including automated software development and web-based tasks.
