Table of Contents
Fetching ...

Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking

Zhi-Cun Lyu, Xin-Ye Li, Zheng Xie, Ming Li

TL;DR

Top Pass tackles the practical need to find correct code from a large pool of candidate programs generated by LLMs, by directly optimizing the pass@k loss to improve the ranking of correct solutions at the top of the candidate list. It introduces a neural ranker with a pass@k-based objective, using selective positive and negative subsets to stabilize training and support multiple k values. Empirical results across CodeContests, APPS, MBPP, and HumanEval show notable improvements in pass@k (e.g., up to a 32.9% relative gain in pass@1 on CodeContests) and robust performance against false positives, indicating strong practical usefulness for developers and end-users. The approach enhances the usability of code-generation systems by reducing the number of candidates a user must inspect, and it lays groundwork for integrating pass@k optimization into reinforcement-learning-based code generation pipelines in the future.

Abstract

Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9\% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.

Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking

TL;DR

Top Pass tackles the practical need to find correct code from a large pool of candidate programs generated by LLMs, by directly optimizing the pass@k loss to improve the ranking of correct solutions at the top of the candidate list. It introduces a neural ranker with a pass@k-based objective, using selective positive and negative subsets to stabilize training and support multiple k values. Empirical results across CodeContests, APPS, MBPP, and HumanEval show notable improvements in pass@k (e.g., up to a 32.9% relative gain in pass@1 on CodeContests) and robust performance against false positives, indicating strong practical usefulness for developers and end-users. The approach enhances the usability of code-generation systems by reducing the number of candidates a user must inspect, and it lays groundwork for integrating pass@k optimization into reinforcement-learning-based code generation pipelines in the future.

Abstract

Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9\% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.
Paper Structure (26 sections, 9 equations, 11 figures, 8 tables)

This paper contains 26 sections, 9 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Code generation system with or without Top Pass. The user can only afford testing or reviewing a few code candidates, thus Top Pass enhances the practical value of code generation systems significantly.
  • Figure 2: Top Pass minimizes a novel pass@k loss function that enhances the ranking quality at the top of the code candidate list, so that the user can solve the programming task with fewer attempts.
  • Figure 3: Examples for (a) top positive, (b) top negative, (c) bottom positive, and (d) bottom negative. Pass@k loss gives more significance to the top positive/negative codes, directing the ranking model towards identifying high-quality solutions instead of indistinguishable wrong codes.
  • Figure 4: The influence of the false positive rate in the training dataset on various methods, observed through the metric pass@k, where $k=1,3$.
  • Figure 5: The impact on pass@1 of different sample numbers during test.
  • ...and 6 more figures

Theorems & Definitions (2)

  • definition 1: Problem-level pass@k
  • definition 2: Expectation of pass@k for random candidate list Chen2021