AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs

Xiaoxin Yin

AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs

Xiaoxin Yin

TL;DR

AlgoPilot tackles fully autonomous program synthesis without human-written programs or trajectories by combining reinforcement learning with a Trajectory Language Model trained on trajectories produced by random Python functions. It uses a random double-loop function generator to create diverse trajectories, and a guided RL setup where the TLM provides a soft constraint to steer the agent toward algorithm-like behavior, demonstrated on sorting tasks where the learned sequences resemble classical algorithms such as Bubble Sort. The paper presents a concrete RL environment, a transformer-based agent restricted to Compare/Swap operations, and a 1.5M-trajectory TLM trained via the Random Function Generator, achieving high success rates across array sizes and illustrating interpretability of the generated trajectories. This work introduces a new paradigm for autonomous algorithm discovery with potential to automate algorithm creation across domains, reducing reliance on human-written programs and paving the way for future algorithm synthesis research.

Abstract

Program synthesis has traditionally relied on human-provided specifications, examples, or prior knowledge to generate functional algorithms. Existing methods either emulate human-written algorithms or solve specific tasks without generating reusable programmatic logic, limiting their ability to create novel algorithms. We introduce AlgoPilot, a groundbreaking approach for fully automated program synthesis without human-written programs or trajectories. AlgoPilot leverages reinforcement learning (RL) guided by a Trajectory Language Model (TLM) to synthesize algorithms from scratch. The TLM, trained on trajectories generated by random Python functions, serves as a soft constraint during the RL process, aligning generated sequences with patterns likely to represent valid algorithms. Using sorting as a test case, AlgoPilot demonstrates its ability to generate trajectories that are interpretable as classical algorithms, such as Bubble Sort, while operating without prior algorithmic knowledge. This work establishes a new paradigm for algorithm discovery and lays the groundwork for future advancements in autonomous program synthesis.

AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs

TL;DR

Abstract

Paper Structure (18 sections, 19 equations, 5 figures, 2 algorithms)

This paper contains 18 sections, 19 equations, 5 figures, 2 algorithms.

Introduction
Related Work
AlgoPilot: Automated Learning of Algorithm without Human Help
Learning to Sort with Reinforcement Learning
Environment of Reinforcement Learning
Reinforcement Learning Agent
Experiment Results
Random Function Generator
Trajectory Language Model (TLM)
Guided Reinforcement Learning
Discussions and Future Work
Expected Numbers of Compare and Swap operations of Quicksort
Model of QuickSort
Expected Number of Comparisons
Expected Number of Swaps
...and 3 more sections

Figures (5)

Figure 1: Success Rate vs. #Episode for various array sizes. We run 100,000 #Episode for each array size from 6 to 14 (step 2), and have at least about 95% success rate in each setting, which means our model successfully sort the array within $3 \cdot array\_size^2$ operations.
Figure 2: #Operations vs. #Episode for various array sizes. We run 100,000 #Episode for each array size from 6 to 14 (step 2). Each horizontal line represents the expected number of operations of Quicksort for each array size, and the plot in the same color represents the number of operations of our model. We can see that the model uses fewer operations for smaller array sizes, and more operations than Quicksort for larger array sizes.
Figure 3: Comparison of Training Loss and Validation Loss over Training Steps with batch size 16, for training Trajectory Language Model.
Figure 4: Success Rate and #Operations vs. #Episodes for AlgoPilot with Guided Reinforcement Learning
Figure 5: Discrepancies vs. #Episodes for trajectories generated by AlgoPilot compared with Bubble Sort

AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs

TL;DR

Abstract

AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs

Authors

TL;DR

Abstract

Table of Contents

Figures (5)