Table of Contents
Fetching ...

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness

Tong Ye, Weigang Huang, Xuhong Zhang, Tengfei Ma, Peiyu Liu, Jianwei Yin, Wenhai Wang

TL;DR

This work targets the dual objectives of code correctness and efficiency in large language model (LLM) code generation. It introduces Llm4Effi, a framework that separates efficiency optimization into logic-domain algorithm exploration and code-domain implementation optimization, followed by a verification-based correctness refinement using synthetic test cases. The approach enables exploring multiple algorithms and their complexities at the logic level, then implementing and refining efficient code, with correctness ensured via a bidirectional test-case verification loop. Empirical results across multiple backbones and benchmarks (EvalPerf, ENAMEL, Mercury) show consistent improvements in both efficiency metrics and correctness, supporting the proposed efficiency-first paradigm and domain separation as effective strategies for automated, high-quality code generation.

Abstract

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works have focused on modifying the initial version of the code to improve its efficiency. However, such refinements are limited by the algorithmic design and overall logic of the initial code, resulting in only incremental improvements. In contrast, when human developers write high-quality code, they typically begin by designing several potential solutions at the logical level, evaluating various algorithms and their complexities, and then proceeding to implement and optimize the solution. In this study, we introduce \tool: \uline{L}arge \uline{L}anguage \uline{M}odel for Code \uline{Effi}ciency, a novel framework that enables LLMs to generate code that balances both efficiency and correctness. Specifically, \tool divides the efficiency optimization process into two domains: algorithmic exploration in the logic domain and implementation optimization in the code domain. The correctness of the code is then guaranteed through a synthetic test case refinement process. This approach, which prioritizes efficiency before ensuring correctness, offers a new paradigm for efficient code generation. Experiments demonstrate that \tool consistently improves both efficiency and correctness, achieving new state-of-the-art performance in code efficiency benchmarks across various LLM backbones.

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness

TL;DR

This work targets the dual objectives of code correctness and efficiency in large language model (LLM) code generation. It introduces Llm4Effi, a framework that separates efficiency optimization into logic-domain algorithm exploration and code-domain implementation optimization, followed by a verification-based correctness refinement using synthetic test cases. The approach enables exploring multiple algorithms and their complexities at the logic level, then implementing and refining efficient code, with correctness ensured via a bidirectional test-case verification loop. Empirical results across multiple backbones and benchmarks (EvalPerf, ENAMEL, Mercury) show consistent improvements in both efficiency metrics and correctness, supporting the proposed efficiency-first paradigm and domain separation as effective strategies for automated, high-quality code generation.

Abstract

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works have focused on modifying the initial version of the code to improve its efficiency. However, such refinements are limited by the algorithmic design and overall logic of the initial code, resulting in only incremental improvements. In contrast, when human developers write high-quality code, they typically begin by designing several potential solutions at the logical level, evaluating various algorithms and their complexities, and then proceeding to implement and optimize the solution. In this study, we introduce \tool: \uline{L}arge \uline{L}anguage \uline{M}odel for Code \uline{Effi}ciency, a novel framework that enables LLMs to generate code that balances both efficiency and correctness. Specifically, \tool divides the efficiency optimization process into two domains: algorithmic exploration in the logic domain and implementation optimization in the code domain. The correctness of the code is then guaranteed through a synthetic test case refinement process. This approach, which prioritizes efficiency before ensuring correctness, offers a new paradigm for efficient code generation. Experiments demonstrate that \tool consistently improves both efficiency and correctness, achieving new state-of-the-art performance in code efficiency benchmarks across various LLM backbones.

Paper Structure

This paper contains 29 sections, 23 figures, 3 tables.

Figures (23)

  • Figure 1: Comparison of Llm4Effi with existing methods. Existing methods generate code first, then optimize it using strategy and execution profiles. In contrast, Llm4Effi starts with the task, focusing on efficiency through algorithm exploration and implementation, followed by correctness refinement.
  • Figure 2: The workflow of Llm4Effi. Given a programming task, Llm4Effi formalizes it into a code-oriented description, generates optimal algorithms and pseudocode in logic domain, and then produces implementation suggestions in code domain. Llm4Effi synthesizes test cases and uses a verification-based adaptive framework to evaluate candidate solutions. The final code is selected based on the highest pass rate of the "checked" test cases.
  • Figure 3: The Beyond@1 performance of Llm4Effi on tasks of varying difficulty levels in Mercury, with DeepSeek-V3 as the backbone.
  • Figure 4: Task Formalization.
  • Figure 5: Checking the Task Formalization Result.
  • ...and 18 more figures