Table of Contents
Fetching ...

Functional Program Synthesis with Higher-Order Functions and Recursion Schemes

Matheus Campos Fernandes

TL;DR

The work addresses the challenge of synthesizing correct programs from specifications by leveraging functional programming and strong typing to constrain the search space. It introduces HOTGP, a higher-order typed genetic programming approach, and Origami, which uses Recursion Schemes to guide the synthesis of recursive programs. HOTGP demonstrates competitive performance on PS benchmarks, while Origami—enhanced with AC/DC diversification, DSLS selection, and new RS patterns—achieves leading success rates across PSB1/PSB2/PolyPSB and competitive alignment with LLMs. Together, these methods show that structuring programs with recursion schemes and enforcing type-safe, pure functional constructs can substantially improve the tractability and quality of PS. The results suggest that recursion-pattern-guided GP and type-driven search are promising directions for scalable, reliable program synthesis in functional settings.

Abstract

Program synthesis is the process of generating a computer program following a set of specifications, such as a set of input-output examples. It can be modeled as a search problem in which the search space is the set of all valid programs. As the search space is vast, brute force is usually not feasible, and search heuristics, such as genetic programming, also have difficulty navigating it without guidance. This text presents 2 novel GP algorithms that synthesize pure, typed, and functional programs: HOTGP and Origami. HOTGP uses strong types and a functional grammar, synthesizing Haskell code, with support for higher-order functions, $λ$-functions, and parametric polymorphism. Experimental results show that HOTGP is competitive with the state of the art. Additionally, Origami is an algorithm that tackles the challenge of effectively handling loops and recursion by exploring Recursion Schemes, in which the programs are composed of well-defined templates with only a few parts that need to be synthesized. The first implementation of Origami can synthesize solutions in several Recursion Schemes and data structures, being competitive with other GP methods in the literature, as well as LLMs. The latest version of Origami employs a novel procedure, called AC/DC, designed to improve the search-space exploration. It achieves considerable improvement over its previous version by raising success rates on every problem. Compared to similar methods in the literature, it has the highest count of problems solved with success rates of $100\%$, $\geq 75\%$, and $\geq 25\%$ across all benchmarks. In $18\%$ of all benchmark problems, it stands as the only method to reach $100\%$ success rate, being the first known approach to achieve it on any problem in PSB2. It also demonstrates competitive performance to LLMs, achieving the highest overall win-rate against Copilot among all GP methods.

Functional Program Synthesis with Higher-Order Functions and Recursion Schemes

TL;DR

The work addresses the challenge of synthesizing correct programs from specifications by leveraging functional programming and strong typing to constrain the search space. It introduces HOTGP, a higher-order typed genetic programming approach, and Origami, which uses Recursion Schemes to guide the synthesis of recursive programs. HOTGP demonstrates competitive performance on PS benchmarks, while Origami—enhanced with AC/DC diversification, DSLS selection, and new RS patterns—achieves leading success rates across PSB1/PSB2/PolyPSB and competitive alignment with LLMs. Together, these methods show that structuring programs with recursion schemes and enforcing type-safe, pure functional constructs can substantially improve the tractability and quality of PS. The results suggest that recursion-pattern-guided GP and type-driven search are promising directions for scalable, reliable program synthesis in functional settings.

Abstract

Program synthesis is the process of generating a computer program following a set of specifications, such as a set of input-output examples. It can be modeled as a search problem in which the search space is the set of all valid programs. As the search space is vast, brute force is usually not feasible, and search heuristics, such as genetic programming, also have difficulty navigating it without guidance. This text presents 2 novel GP algorithms that synthesize pure, typed, and functional programs: HOTGP and Origami. HOTGP uses strong types and a functional grammar, synthesizing Haskell code, with support for higher-order functions, -functions, and parametric polymorphism. Experimental results show that HOTGP is competitive with the state of the art. Additionally, Origami is an algorithm that tackles the challenge of effectively handling loops and recursion by exploring Recursion Schemes, in which the programs are composed of well-defined templates with only a few parts that need to be synthesized. The first implementation of Origami can synthesize solutions in several Recursion Schemes and data structures, being competitive with other GP methods in the literature, as well as LLMs. The latest version of Origami employs a novel procedure, called AC/DC, designed to improve the search-space exploration. It achieves considerable improvement over its previous version by raising success rates on every problem. Compared to similar methods in the literature, it has the highest count of problems solved with success rates of , , and across all benchmarks. In of all benchmark problems, it stands as the only method to reach success rate, being the first known approach to achieve it on any problem in PSB2. It also demonstrates competitive performance to LLMs, achieving the highest overall win-rate against Copilot among all GP methods.

Paper Structure

This paper contains 101 sections, 6 figures, 21 tables, 6 algorithms.

Figures (6)

  • Figure 1: Examples of syntax trees. Leaves (terminals) are represented as ellipses and nodes (functions or non-terminals) as hexagons.
  • Figure 2: An example of crossover on two trees.
  • Figure 3: Percentage of reduction in the number of nodes caused by the refinement process.
  • Figure 4: Distribution of recursion schemes used to solve the full set of PSB1 problems.
  • Figure 5: Success rates under different AC and DC configurations. A value of $1$ indicates execution every generation, $2$ every other generation, $299$ only once just before the final ($300^{\mathit{th}}$) generation, and $\infty$ indicates the procedure is never executed.
  • ...and 1 more figures