AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

Bin Lei; Yuchen Li; Qiuwu Chen

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

Bin Lei, Yuchen Li, Qiuwu Chen

TL;DR

AutoCoder introduces AIEV-Instruct, a two-stage agent-interaction and execution-verified method for generating high-quality code datasets without heavy reliance on closed-source teachers. Trained on this data, AutoCoder-33B achieves state-of-the-art Pass@1 on HumanEval, surpassing GPT-4 Turbo and GPT-4o, and demonstrates a versatile code interpreter capable of installing external packages. The approach is validated across Python coding, multilingual programming, and data-science tasks, highlighting the viability of open-source code LLMs built with execution-verified data. Overall, the work provides a scalable framework for high-quality code data generation and opens new avenues for improving code-aware LLMs with autonomous self-learning.

Abstract

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90.9\%}$ vs. $\mathbf{90.2\%}$). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset. The code and the demo video is available in \url{https://github.com/bin123apple/AutoCoder}.

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

TL;DR

Abstract

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test (

vs.

). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset. The code and the demo video is available in \url{https://github.com/bin123apple/AutoCoder}.

Paper Structure (13 sections, 1 equation, 6 figures, 3 tables)

This paper contains 13 sections, 1 equation, 6 figures, 3 tables.

Introduction
AIEV-Instruct
Overall Architecture
Dataset Analysis
AutoCoder
Code Interpreter
Training
Experiment
Python Text to Code Generation
Multilingual Code Generation
Code Generation for Data Science
Comparison with the Base Model
Conclusion

Figures (6)

Figure 1: Pass@1 ($\%$) comparison of Various LLMs on the HumanEval Base Test.
Figure 2: Comparison of Code Interpreter Functions between AutoCoder and GPT-4o. : Nature language generated by the model;: Code generated by the model. AutoCoder can recognize external package installation commands, whereas GPT-4o can only run code that includes built-in packages. The demo video is in https://github.com/bin123apple/AutoCoder .
Figure 3: The overall architecture of the AIEV-Instruct.
Figure 4: The comparison between the AutoCoder-AIEV-Instruct and other large code datasets.
Figure 5: AutoCoder-AIEV-Instruct dataset post-processing.:Nature language;:Code execution request from the User;:Code execution request response from the Assistant; :Bash command;:Code block;:Special token;:Execution result.
...and 1 more figures

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

TL;DR

Abstract

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

Authors

TL;DR

Abstract

Table of Contents

Figures (6)