Table of Contents
Fetching ...

Shedding Light in Task Decomposition in Program Synthesis: The Driving Force of the Synthesizer Model

Janis Zenkner, Tobias Sesterhenn, Christian Bartelt

TL;DR

This paper investigates how task decomposition interacts with program generation by comparing ExeDec, which uses a Subgoal Model plus a Synthesizer Model, against REGISM, which relies on iterative execution-guided synthesis without explicit decomposition. Across RobustFill and DeepCoder domains, ExeDec shows strengths in length generalization and concept composition, but REGISM frequently matches or exceeds ExeDec, suggesting that repeated execution is a key driver. The results reveal domain-dependent roles for decomposition and indicate that the Synthesizer Model's repeated invocation largely governs success, with REGISM approaching ExeDec on average and sometimes outperforming it. The work also positions REGISM as a useful ablation method to isolate decomposition contributions and motivates hybrid approaches that combine decomposition with execution-guided synthesis.

Abstract

Task decomposition is a fundamental mechanism in program synthesis, enabling complex problems to be broken down into manageable subtasks. ExeDec, a state-of-the-art program synthesis framework, employs this approach by combining a Subgoal Model for decomposition and a Synthesizer Model for program generation to facilitate compositional generalization. In this work, we develop REGISM, an adaptation of ExeDec that removes decomposition guidance and relies solely on iterative execution-driven synthesis. By comparing these two exemplary approaches-ExeDec, which leverages task decomposition, and REGISM, which does not-we investigate the interplay between task decomposition and program generation. Our findings indicate that ExeDec exhibits significant advantages in length generalization and concept composition tasks, likely due to its explicit decomposition strategies. At the same time, REGISM frequently matches or surpasses ExeDec's performance across various scenarios, with its solutions often aligning more closely with ground truth decompositions. These observations highlight the importance of repeated execution-guided synthesis in driving task-solving performance, even within frameworks that incorporate explicit decomposition strategies. Our analysis suggests that task decomposition approaches like ExeDec hold significant potential for advancing program synthesis, though further work is needed to clarify when and why these strategies are most effective.

Shedding Light in Task Decomposition in Program Synthesis: The Driving Force of the Synthesizer Model

TL;DR

This paper investigates how task decomposition interacts with program generation by comparing ExeDec, which uses a Subgoal Model plus a Synthesizer Model, against REGISM, which relies on iterative execution-guided synthesis without explicit decomposition. Across RobustFill and DeepCoder domains, ExeDec shows strengths in length generalization and concept composition, but REGISM frequently matches or exceeds ExeDec, suggesting that repeated execution is a key driver. The results reveal domain-dependent roles for decomposition and indicate that the Synthesizer Model's repeated invocation largely governs success, with REGISM approaching ExeDec on average and sometimes outperforming it. The work also positions REGISM as a useful ablation method to isolate decomposition contributions and motivates hybrid approaches that combine decomposition with execution-guided synthesis.

Abstract

Task decomposition is a fundamental mechanism in program synthesis, enabling complex problems to be broken down into manageable subtasks. ExeDec, a state-of-the-art program synthesis framework, employs this approach by combining a Subgoal Model for decomposition and a Synthesizer Model for program generation to facilitate compositional generalization. In this work, we develop REGISM, an adaptation of ExeDec that removes decomposition guidance and relies solely on iterative execution-driven synthesis. By comparing these two exemplary approaches-ExeDec, which leverages task decomposition, and REGISM, which does not-we investigate the interplay between task decomposition and program generation. Our findings indicate that ExeDec exhibits significant advantages in length generalization and concept composition tasks, likely due to its explicit decomposition strategies. At the same time, REGISM frequently matches or surpasses ExeDec's performance across various scenarios, with its solutions often aligning more closely with ground truth decompositions. These observations highlight the importance of repeated execution-guided synthesis in driving task-solving performance, even within frameworks that incorporate explicit decomposition strategies. Our analysis suggests that task decomposition approaches like ExeDec hold significant potential for advancing program synthesis, though further work is needed to clarify when and why these strategies are most effective.

Paper Structure

This paper contains 22 sections, 2 equations, 13 figures.

Figures (13)

  • Figure 1: ExeDec workflow illustrating the challenges posed by misleading subtasks. The task is taken from the original test set and can be solved by computing the cumulative maximum of the input list. The subgoals are those generated by ExeDec. The decomposition process is shown for a single io pair, while the full workflow with all io pairs is available in Appendix \ref{['app:examples']}.
  • Figure 2: ExeDec Workflow: First, the Subgoal Model predicts the next subgoal based on the current task specifications. Next, the Synthesizer Model generates a program using the subtask specifications, aiming to solve the subtask. Finally, the generated program is executed, and its output is used to update the task specifications.
  • Figure 3: Tasks solved by ExeDec, clustered based on subtask and subprogram accuracy. The x-axis represents subtask accuracy per task (overlap between predicted and ground truth subtasks), while the y-axis indicates subprogram accuracy (overlap between predicted programs and ground truth solutions). Values are averaged across compositional generalization categories, based on 8,900+ solved tasks in DeepCoder and 26,000+ in RobustFill.
  • Figure 4: Comparison of the number of decompositions of task solution in the DeepCoder domain using ExeDec.
  • Figure 5: Compositional generalization results using a beam size of 10. End-to-end test accuracy resembles the relative number of solved test tasks.
  • ...and 8 more figures