Table of Contents
Fetching ...

EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

Dong Huang, Guangtao Zeng, Jianbo Dai, Meng Luo, Han Weng, Yuhao Qing, Heming Cui, Zhijiang Guo, Jie M. Zhang

TL;DR

EffiCoder targets the overlooked aspect of efficiency in LLM-generated code by creating EffiInstruct, an efficiency-oriented fine-tuning dataset assembled from diverse open-source tasks and evaluated via local execution. The approach uses multiple LLMs to generate candidate solutions, selects the most efficient per task, and fine-tunes models to produce faster, more memory-efficient code without sacrificing correctness. Across Python, data-science tasks, and multilingual code, EffiInstruct achieves substantial improvements in execution time, memory usage, and pass@1 scores compared to baselines like PIE and Mercury. The work also emphasizes scalability and open science by releasing training data, code, and models, aiming to accelerate research into sustainable, high-performance AI-assisted coding.

Abstract

As large language models (LLMs) play an increasingly important role in code generation, enhancing both correctness and efficiency has become crucial. Current methods primarily focus on correctness, often overlooking efficiency. To address this gap, we introduce EffiCoder to improve both aspects by fine-tuning LLMs on a high-quality dataset comprising correct and efficient code samples. Our methodology involves leveraging multiple LLMs to generate diverse candidate code solutions for various tasks across different programming languages. We then evaluate these solutions by measuring their execution time and memory usage through local execution. The code solution with the lowest execution time and memory consumption is selected as the final output for each task. Experimental results demonstrate significant improvements when fine-tuning with Effi-Instruct. For instance, Qwen2.5-Coder-7B-Instruct's pass@1 score increases from 44.8\% to 57.7\%, while the average execution time for correct tasks decreases by 48.4\%. EffiCoder offers a scalable and effective solution for advancing AI-driven code generation, benefiting software development and computational problem-solving. The source code of Effi-Code was released at https://github.com/huangd1999/EffiCoder.

EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning

TL;DR

EffiCoder targets the overlooked aspect of efficiency in LLM-generated code by creating EffiInstruct, an efficiency-oriented fine-tuning dataset assembled from diverse open-source tasks and evaluated via local execution. The approach uses multiple LLMs to generate candidate solutions, selects the most efficient per task, and fine-tunes models to produce faster, more memory-efficient code without sacrificing correctness. Across Python, data-science tasks, and multilingual code, EffiInstruct achieves substantial improvements in execution time, memory usage, and pass@1 scores compared to baselines like PIE and Mercury. The work also emphasizes scalability and open science by releasing training data, code, and models, aiming to accelerate research into sustainable, high-performance AI-assisted coding.

Abstract

As large language models (LLMs) play an increasingly important role in code generation, enhancing both correctness and efficiency has become crucial. Current methods primarily focus on correctness, often overlooking efficiency. To address this gap, we introduce EffiCoder to improve both aspects by fine-tuning LLMs on a high-quality dataset comprising correct and efficient code samples. Our methodology involves leveraging multiple LLMs to generate diverse candidate code solutions for various tasks across different programming languages. We then evaluate these solutions by measuring their execution time and memory usage through local execution. The code solution with the lowest execution time and memory consumption is selected as the final output for each task. Experimental results demonstrate significant improvements when fine-tuning with Effi-Instruct. For instance, Qwen2.5-Coder-7B-Instruct's pass@1 score increases from 44.8\% to 57.7\%, while the average execution time for correct tasks decreases by 48.4\%. EffiCoder offers a scalable and effective solution for advancing AI-driven code generation, benefiting software development and computational problem-solving. The source code of Effi-Code was released at https://github.com/huangd1999/EffiCoder.

Paper Structure

This paper contains 33 sections, 6 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Overview of the construction pipeline for EffiInstruct: We begin by collecting the initial EffiInstruct from different open-source datasets. Starting with the original code, we require multiple LLMs to generate candidate solutions, using test cases to profile execution overhead, and use the most efficient solution generated by LLMs as the solution for each task. We then have our final fine-tuning dataset, EffiInstruct, which consists of optimized code and rich metadata, designed to train models for generating efficient code.
  • Figure 2: Examples of code with varying efficiency levels: The first solution has high memory usage and long execution time. The second reduces memory usage but still has a long execution time. The third is optimized for low memory usage and fast execution.
  • Figure 3: Correlation of the efficiency of the automatically generated code and the LLM code train set.
  • Figure 4: Efficiency distribution of the Python subset collected from Hugging Face. The figure shows the distribution of execution time, memory usage, and max memory peak for both inefficient (task-provided solution) and efficient solutions in the EffiInstruct. The inefficient solutions have higher overheads for all three metrics than the efficient ones.
  • Figure 5: A case illustration for the task with code generated by Qwen2.5-Coder-7B and EffiInstruct fine-tuned Qwen2.5-Coder-7B in EffiBench problem_idx=2305.