DARWIN: Dynamic Agentically Rewriting Self-Improving Network
Henry Jiang
TL;DR
DARWIN proposes a Darwin‑Gödel‑inspired self‑improving framework for GPT models, using a genetic‑algorithm‑like loop where multiple agents mutate each other’s training code via prompting and are selected by performance benchmarks. It emphasizes containerized isolation, persistent memory in JSON files, and a bidirectional HITL interface to manage upgrades like datasets and file structures. In experiments with nanoGPT and OpenAI API‑driven mutations, it reports modest improvements in MFU (1.26%) and perplexity (2.07%) over five generations, indicating potential for scaling evolutionary GPT training. The work contributes a proof‑of‑concept open‑source framework, architectural innovations for memory and HITL, and a discussion of practical pathways to scale self‑improvement under compute constraints.
Abstract
DARWIN is an evolutionary GPT model, utilizing a genetic-algorithm like optimization structure with several independent GPT agents being trained individually using unique training code. Each iteration, the GPT models are prompted to modify the training code of one another in an attempt to improve their performance in a mutation-like manner, and the best GPT agents are then benchmarked and selected for the next iteration by genetic algorithm. For demonstration purposes and due to budget and time constraints, OpenAI API is used to prompt training code improvements and the nanoGPT framework is used as the training code. DARWIN also utilizes persistent JSON-based memory files to track previous reasoning and changes to code to correlate with improvement to model performance. and a bidirectional interface for HITL intervention allowing the model to request upgrades such as additional datasets, training scripts, and restructuring of file hierarchies. In experiments, DARWIN achieved a 1.26 percent improvement in model FLOPS utilization (MFU) and a 2.07 percent improvement to perplexity in 5 iterations of training over baseline configurations, demonstrating promising capabilities as a foundation for scaling evolutionary GPT training.
