Table of Contents
Fetching ...

AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs

Xiaoxin Yin

TL;DR

AlgoPilot tackles fully autonomous program synthesis without human-written programs or trajectories by combining reinforcement learning with a Trajectory Language Model trained on trajectories produced by random Python functions. It uses a random double-loop function generator to create diverse trajectories, and a guided RL setup where the TLM provides a soft constraint to steer the agent toward algorithm-like behavior, demonstrated on sorting tasks where the learned sequences resemble classical algorithms such as Bubble Sort. The paper presents a concrete RL environment, a transformer-based agent restricted to Compare/Swap operations, and a 1.5M-trajectory TLM trained via the Random Function Generator, achieving high success rates across array sizes and illustrating interpretability of the generated trajectories. This work introduces a new paradigm for autonomous algorithm discovery with potential to automate algorithm creation across domains, reducing reliance on human-written programs and paving the way for future algorithm synthesis research.

Abstract

Program synthesis has traditionally relied on human-provided specifications, examples, or prior knowledge to generate functional algorithms. Existing methods either emulate human-written algorithms or solve specific tasks without generating reusable programmatic logic, limiting their ability to create novel algorithms. We introduce AlgoPilot, a groundbreaking approach for fully automated program synthesis without human-written programs or trajectories. AlgoPilot leverages reinforcement learning (RL) guided by a Trajectory Language Model (TLM) to synthesize algorithms from scratch. The TLM, trained on trajectories generated by random Python functions, serves as a soft constraint during the RL process, aligning generated sequences with patterns likely to represent valid algorithms. Using sorting as a test case, AlgoPilot demonstrates its ability to generate trajectories that are interpretable as classical algorithms, such as Bubble Sort, while operating without prior algorithmic knowledge. This work establishes a new paradigm for algorithm discovery and lays the groundwork for future advancements in autonomous program synthesis.

AlgoPilot: Fully Autonomous Program Synthesis Without Human-Written Programs

TL;DR

AlgoPilot tackles fully autonomous program synthesis without human-written programs or trajectories by combining reinforcement learning with a Trajectory Language Model trained on trajectories produced by random Python functions. It uses a random double-loop function generator to create diverse trajectories, and a guided RL setup where the TLM provides a soft constraint to steer the agent toward algorithm-like behavior, demonstrated on sorting tasks where the learned sequences resemble classical algorithms such as Bubble Sort. The paper presents a concrete RL environment, a transformer-based agent restricted to Compare/Swap operations, and a 1.5M-trajectory TLM trained via the Random Function Generator, achieving high success rates across array sizes and illustrating interpretability of the generated trajectories. This work introduces a new paradigm for autonomous algorithm discovery with potential to automate algorithm creation across domains, reducing reliance on human-written programs and paving the way for future algorithm synthesis research.

Abstract

Program synthesis has traditionally relied on human-provided specifications, examples, or prior knowledge to generate functional algorithms. Existing methods either emulate human-written algorithms or solve specific tasks without generating reusable programmatic logic, limiting their ability to create novel algorithms. We introduce AlgoPilot, a groundbreaking approach for fully automated program synthesis without human-written programs or trajectories. AlgoPilot leverages reinforcement learning (RL) guided by a Trajectory Language Model (TLM) to synthesize algorithms from scratch. The TLM, trained on trajectories generated by random Python functions, serves as a soft constraint during the RL process, aligning generated sequences with patterns likely to represent valid algorithms. Using sorting as a test case, AlgoPilot demonstrates its ability to generate trajectories that are interpretable as classical algorithms, such as Bubble Sort, while operating without prior algorithmic knowledge. This work establishes a new paradigm for algorithm discovery and lays the groundwork for future advancements in autonomous program synthesis.
Paper Structure (18 sections, 19 equations, 5 figures, 2 algorithms)

This paper contains 18 sections, 19 equations, 5 figures, 2 algorithms.

Figures (5)

  • Figure 1: Success Rate vs. #Episode for various array sizes. We run 100,000 #Episode for each array size from 6 to 14 (step 2), and have at least about 95% success rate in each setting, which means our model successfully sort the array within $3 \cdot array\_size^2$ operations.
  • Figure 2: #Operations vs. #Episode for various array sizes. We run 100,000 #Episode for each array size from 6 to 14 (step 2). Each horizontal line represents the expected number of operations of Quicksort for each array size, and the plot in the same color represents the number of operations of our model. We can see that the model uses fewer operations for smaller array sizes, and more operations than Quicksort for larger array sizes.
  • Figure 3: Comparison of Training Loss and Validation Loss over Training Steps with batch size 16, for training Trajectory Language Model.
  • Figure 4: Success Rate and #Operations vs. #Episodes for AlgoPilot with Guided Reinforcement Learning
  • Figure 5: Discrepancies vs. #Episodes for trajectories generated by AlgoPilot compared with Bubble Sort