Table of Contents
Fetching ...

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

Jingchang Chen, Hongxuan Tang, Zheng Chu, Qianglong Chen, Zekun Wang, Ming Liu, Bing Qin

TL;DR

This work proposes FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus, which is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.

Abstract

Despite recent progress made by large language models in code generation, they still struggle with programs that meet complex requirements. Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Yet, planning deep-inside requirements in advance can be challenging, and the tests need to be accurate to accomplish self-improvement. To this end, we propose FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus. Specifically, FunCoder recursively branches off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives. Additionally, we designate functions via a consensus formed by identifying similarities in program behavior, mitigating error propagation. FunCoder outperforms state-of-the-art methods by +9.8% on average in HumanEval, MBPP, xCodeEval and MATH with GPT-3.5 and GPT-4. Moreover, our method demonstrates superiority on smaller models: With FunCoder, StableCode-3b surpasses GPT-3.5 by +18.6% and achieves 97.7% of GPT-4's performance on HumanEval. Further analysis reveals that our proposed dynamic function decomposition is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.

Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation

TL;DR

This work proposes FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus, which is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.

Abstract

Despite recent progress made by large language models in code generation, they still struggle with programs that meet complex requirements. Recent work utilizes plan-and-solve decomposition to decrease the complexity and leverage self-tests to refine the generated program. Yet, planning deep-inside requirements in advance can be challenging, and the tests need to be accurate to accomplish self-improvement. To this end, we propose FunCoder, a code generation framework incorporating the divide-and-conquer strategy with functional consensus. Specifically, FunCoder recursively branches off sub-functions as smaller goals during code generation, represented by a tree hierarchy. These sub-functions are then composited to attain more complex objectives. Additionally, we designate functions via a consensus formed by identifying similarities in program behavior, mitigating error propagation. FunCoder outperforms state-of-the-art methods by +9.8% on average in HumanEval, MBPP, xCodeEval and MATH with GPT-3.5 and GPT-4. Moreover, our method demonstrates superiority on smaller models: With FunCoder, StableCode-3b surpasses GPT-3.5 by +18.6% and achieves 97.7% of GPT-4's performance on HumanEval. Further analysis reveals that our proposed dynamic function decomposition is capable of handling complex requirements, and the functional consensus prevails over self-testing in correctness evaluation.
Paper Structure (62 sections, 2 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 62 sections, 2 equations, 5 figures, 13 tables, 2 algorithms.

Figures (5)

  • Figure 1: A flowgraph illustrates FunCoder. FunCoder branches off new functions to have sub-goals tackled iteratively (left), re-composites sub-functions, and selects the best using functional consensus (right). Bottom-right figure shows how FunCoder writes functions at hierarchy-level.
  • Figure 2: Left: Algorithm for FunCoder, explained in detail in Appendix \ref{['appendix_algo_explained']}. Right: Comparison between decomposition by planning and our approach. FunCoder introduces new functions to describe sub-goals solely with code, achieving a more natural way of requirement decomposition.
  • Figure 3: (a) Preliminary study on self-testing, the programs are evaluated using unit-tests generated by LLMs. (b) The effectiveness of different ranking strategies. We compute the Pass@k over top-k programs ranked by functional consensus, self-test, and random on 11 candidates. (higher is better)
  • Figure 4: Average accuracy in each level with the chat model (GPT-3.5) and the code model (StableCode$_{3b}$) on the MATH benchmark.
  • Figure 5: Left: Algorithm for FunCoder. Right: Decomposition example of A[B[DE]C].