Table of Contents
Fetching ...

Program Synthesis via Test-Time Transduction

Kang-il Lee, Jahyun Koo, Seunghyun Yoon, Minbeom Kim, Hyukhun Koh, Dongryeol Lee, Kyomin Jung

TL;DR

The paper introduces transductive program synthesis and the SYNTRA framework, which leverage test inputs during synthesis to reduce epistemic uncertainty. By framing synthesis as active learning over a finite hypothesis class of program-output tuples on $N$ test inputs, and using a maximin input-selection strategy with a transduction model to prune hypotheses, SYNTRA achieves improved accuracy and efficiency across string transformation (Playgol), code generation (MBPP+), visual reasoning (1D-ARC), and MiniGrid world modeling. Empirical results show substantial task- and example-level gains and sublinear query growth with increasing test inputs, demonstrating robustness to edge cases and scalability. The framework is adaptable to online/human-in-the-loop settings and can enhance or complement direct LLM transduction, with open-source code released for reproducibility and further research.

Abstract

We introduce transductive program synthesis, a new formulation of the program synthesis task that explicitly leverages test inputs during synthesis. While prior approaches to program synthesis--whether based on natural language descriptions or input-output examples--typically aim to generalize from training examples, they often struggle with robustness, especially in real-world settings where training examples are limited and test inputs involve various edge cases. To address this, we propose a novel framework that improves robustness by treating synthesis as an active learning over a finite hypothesis class defined by programs' outputs. We use an LLM to predict outputs for selected test inputs and eliminate inconsistent hypotheses, where the inputs are chosen via a greedy maximin algorithm to minimize the number of LLM queries required. We evaluate our approach on four benchmarks: Playgol, MBPP+, 1D-ARC, and programmatic world modeling on MiniGrid. We demonstrate that our method significantly improves program synthesis in both accuracy and efficiency. We release our code at https://github.com/klee972/SYNTRA.

Program Synthesis via Test-Time Transduction

TL;DR

The paper introduces transductive program synthesis and the SYNTRA framework, which leverage test inputs during synthesis to reduce epistemic uncertainty. By framing synthesis as active learning over a finite hypothesis class of program-output tuples on test inputs, and using a maximin input-selection strategy with a transduction model to prune hypotheses, SYNTRA achieves improved accuracy and efficiency across string transformation (Playgol), code generation (MBPP+), visual reasoning (1D-ARC), and MiniGrid world modeling. Empirical results show substantial task- and example-level gains and sublinear query growth with increasing test inputs, demonstrating robustness to edge cases and scalability. The framework is adaptable to online/human-in-the-loop settings and can enhance or complement direct LLM transduction, with open-source code released for reproducibility and further research.

Abstract

We introduce transductive program synthesis, a new formulation of the program synthesis task that explicitly leverages test inputs during synthesis. While prior approaches to program synthesis--whether based on natural language descriptions or input-output examples--typically aim to generalize from training examples, they often struggle with robustness, especially in real-world settings where training examples are limited and test inputs involve various edge cases. To address this, we propose a novel framework that improves robustness by treating synthesis as an active learning over a finite hypothesis class defined by programs' outputs. We use an LLM to predict outputs for selected test inputs and eliminate inconsistent hypotheses, where the inputs are chosen via a greedy maximin algorithm to minimize the number of LLM queries required. We evaluate our approach on four benchmarks: Playgol, MBPP+, 1D-ARC, and programmatic world modeling on MiniGrid. We demonstrate that our method significantly improves program synthesis in both accuracy and efficiency. We release our code at https://github.com/klee972/SYNTRA.

Paper Structure

This paper contains 43 sections, 3 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: An example of transductive program synthesis. Given the training examples (rows 1 and 2) as input, the inductive program synthesizer generates a program that satisfies these examples. However, this program produces an incorrect output for the test input in row 4, which represents an edge case.
  • Figure 2: An example of the maximin algorithm. The numbers of eliminated hypotheses in the worst case are shown in the "min" column.
  • Figure 3: Examples of Playgol, MBPP+, 1D-ARC and MiniGrid domain. Test outputs are highlighted in green.
  • Figure 4: Experimental results on test input scaling and the unseen test set.