Table of Contents
Fetching ...

AIDE: AI-Driven Exploration in the Space of Code

Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, Yuxiang Wu

TL;DR

Machine learning engineering currently relies on labor-intensive trial-and-error, creating bottlenecks in model development. The authors propose AIDE, an LLM-powered agent that reframes ML engineering as a code-space optimization problem and uses a tree-structured solution space with drafting, debugging, and improving operators guided by evaluations. Empirical results on Weco-Kaggle, OpenAI MLE-Bench, and METR RE-Bench show AIDE achieving state-of-the-art or competitive performance, often surpassing baselines and sometimes human experts within constrained time. The work demonstrates that structured, incremental improvements to code via LLMs can yield substantial gains in automation efficiency and performance, with broad applicability to AutoML, NAS, and AI R&D tasks. Overall, AIDE offers a principled, scalable framework for automated ML engineering that leverages code-space search to improve sample efficiency and resource use.

Abstract

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

AIDE: AI-Driven Exploration in the Space of Code

TL;DR

Machine learning engineering currently relies on labor-intensive trial-and-error, creating bottlenecks in model development. The authors propose AIDE, an LLM-powered agent that reframes ML engineering as a code-space optimization problem and uses a tree-structured solution space with drafting, debugging, and improving operators guided by evaluations. Empirical results on Weco-Kaggle, OpenAI MLE-Bench, and METR RE-Bench show AIDE achieving state-of-the-art or competitive performance, often surpassing baselines and sometimes human experts within constrained time. The work demonstrates that structured, incremental improvements to code via LLMs can yield substantial gains in automation efficiency and performance, with broad applicability to AutoML, NAS, and AI R&D tasks. Overall, AIDE offers a principled, scalable framework for automated ML engineering that leverages code-space search to improve sample efficiency and resource use.

Abstract

Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.

Paper Structure

This paper contains 33 sections, 1 equation, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: A sample solution tree $T$ for AIDE, where each node is a Python script. Arrows represent transitions proposed by the coding operator $f$. Some branches terminate in a bug, while others lead to improved or optimal solutions.
  • Figure 2: AIDE's performance distribution on full Weco-Kaggle benchmark. Exceeds % of Humans values are estimated from the leaderboard distribution.
  • Figure 3: Performance of o1-preview with and without AIDE on MLE-bench Lite (complexity=low) set.
  • Figure 4: Average score achieved by AIDE+o1-preview and top human scientists on 7 AI R&D tasks, as report by rebench. AIDE managed to surpass human scientists within six hours by enabling faster experiment iterations. However, human scientists eventually caught up, as AIDE adopts a simple greedy policy that may lead to local optima on challenging R&D tasks.
  • Figure 5: The task descriptor for bike-sharing-demand
  • ...and 2 more figures