Table of Contents
Fetching ...

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

Bin Lei, Yuchen Li, Qiuwu Chen

TL;DR

AutoCoder introduces AIEV-Instruct, a two-stage agent-interaction and execution-verified method for generating high-quality code datasets without heavy reliance on closed-source teachers. Trained on this data, AutoCoder-33B achieves state-of-the-art Pass@1 on HumanEval, surpassing GPT-4 Turbo and GPT-4o, and demonstrates a versatile code interpreter capable of installing external packages. The approach is validated across Python coding, multilingual programming, and data-science tasks, highlighting the viability of open-source code LLMs built with execution-verified data. Overall, the work provides a scalable framework for high-quality code data generation and opens new avenues for improving code-aware LLMs with autonomous self-learning.

Abstract

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90.9\%}$ vs. $\mathbf{90.2\%}$). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset. The code and the demo video is available in \url{https://github.com/bin123apple/AutoCoder}.

AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}

TL;DR

AutoCoder introduces AIEV-Instruct, a two-stage agent-interaction and execution-verified method for generating high-quality code datasets without heavy reliance on closed-source teachers. Trained on this data, AutoCoder-33B achieves state-of-the-art Pass@1 on HumanEval, surpassing GPT-4 Turbo and GPT-4o, and demonstrates a versatile code interpreter capable of installing external packages. The approach is validated across Python coding, multilingual programming, and data-science tasks, highlighting the viability of open-source code LLMs built with execution-verified data. Overall, the work provides a scalable framework for high-quality code data generation and opens new avenues for improving code-aware LLMs with autonomous self-learning.

Abstract

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ( vs. ). In addition, AutoCoder offers a more versatile code interpreter compared to GPT-4 Turbo and GPT-4o. It's code interpreter can install external packages instead of limiting to built-in packages. AutoCoder's training data is a multi-turn dialogue dataset created by a system combining agent interaction and external code execution verification, a method we term \textbf{\textsc{AIEV-Instruct}} (Instruction Tuning with Agent-Interaction and Execution-Verified). Compared to previous large-scale code dataset generation methods, \textsc{AIEV-Instruct} reduces dependence on proprietary large models and provides execution-validated code dataset. The code and the demo video is available in \url{https://github.com/bin123apple/AutoCoder}.
Paper Structure (13 sections, 1 equation, 6 figures, 3 tables)

This paper contains 13 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Pass@1 ($\%$) comparison of Various LLMs on the HumanEval Base Test.
  • Figure 2: Comparison of Code Interpreter Functions between AutoCoder and GPT-4o. : Nature language generated by the model;: Code generated by the model. AutoCoder can recognize external package installation commands, whereas GPT-4o can only run code that includes built-in packages. The demo video is in https://github.com/bin123apple/AutoCoder .
  • Figure 3: The overall architecture of the AIEV-Instruct.
  • Figure 4: The comparison between the AutoCoder-AIEV-Instruct and other large code datasets.
  • Figure 5: AutoCoder-AIEV-Instruct dataset post-processing.:Nature language;:Code execution request from the User;:Code execution request response from the Assistant; :Bash command;:Code block;:Special token;:Execution result.
  • ...and 1 more figures