AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs
Xiaoxin Yin
TL;DR
AlgoPilot tackles fully autonomous program synthesis without human-written programs or trajectories by combining reinforcement learning with a Trajectory Language Model trained on trajectories produced by random Python functions. It uses a random double-loop function generator to create diverse trajectories, and a guided RL setup where the TLM provides a soft constraint to steer the agent toward algorithm-like behavior, demonstrated on sorting tasks where the learned sequences resemble classical algorithms such as Bubble Sort. The paper presents a concrete RL environment, a transformer-based agent restricted to Compare/Swap operations, and a 1.5M-trajectory TLM trained via the Random Function Generator, achieving high success rates across array sizes and illustrating interpretability of the generated trajectories. This work introduces a new paradigm for autonomous algorithm discovery with potential to automate algorithm creation across domains, reducing reliance on human-written programs and paving the way for future algorithm synthesis research.
Abstract
Program synthesis has traditionally relied on human-provided specifications, examples, or prior knowledge to generate functional algorithms. Existing methods either emulate human-written algorithms or solve specific tasks without generating reusable programmatic logic, limiting their ability to create novel algorithms. We introduce AlgoPilot, a groundbreaking approach for fully automated program synthesis without human-written programs or trajectories. AlgoPilot leverages reinforcement learning (RL) guided by a Trajectory Language Model (TLM) to synthesize algorithms from scratch. The TLM, trained on trajectories generated by random Python functions, serves as a soft constraint during the RL process, aligning generated sequences with patterns likely to represent valid algorithms. Using sorting as a test case, AlgoPilot demonstrates its ability to generate trajectories that are interpretable as classical algorithms, such as Bubble Sort, while operating without prior algorithmic knowledge. This work establishes a new paradigm for algorithm discovery and lays the groundwork for future advancements in autonomous program synthesis.
