Table of Contents
Fetching ...

R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science

Xu Yang, Xiao Yang, Shikai Fang, Yifei Zhang, Jian Wang, Bowen Xian, Qizheng Li, Jingyuan Li, Minrui Xu, Yuante Li, Haoran Pan, Yuge Zhang, Weiqing Liu, Yelong Shen, Weizhu Chen, Jiang Bian

TL;DR

R&D-Agent tackles the data-science automation bottleneck by formalizing MLE into two phases—Research and Development—each supported by four and two modular components, respectively, and driven by two specialized agents (Researcher and Developer). The framework enables principled exploration of the large MLE design space, with a human-expert-inspired configuration achieving state-of-the-art performance on MLE-Bench under constrained resources. Comprehensive ablations demonstrate that both the phased architecture and the individual components (notably planning, adaptive exploration, memory context, coding workflow, and evaluation strategy) meaningfully contribute to gains. By open-sourcing the framework and optimal configurations, the work provides a practical, reusable platform for accelerating autonomous data science with broad applicability across Kaggle-like tasks and real-world data challenges.

Abstract

Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. Although crowd-sourcing platforms alleviate some challenges, high-level machine learning engineering (MLE) tasks remain labor-intensive and iterative. We introduce R&D-Agent, a comprehensive, decoupled, and extensible framework that formalizes the MLE process. R&D-Agent defines the MLE workflow into two phases and six components, turning agent design for MLE from ad-hoc craftsmanship into a principled, testable process. Although several existing agents report promising gains on their chosen components, they can mostly be summarized as a partial optimization from our framework's simple baseline. Inspired by human experts, we designed efficient and effective agents within this framework that achieve state-of-the-art performance. Evaluated on MLE-Bench, the agent built on R&D-Agent ranks as the top-performing machine learning engineering agent, achieving 35.1% any medal rate, demonstrating the ability of the framework to speed up innovation and improve accuracy across a wide range of data science applications. We have open-sourced R&D-Agent on GitHub: https://github.com/microsoft/RD-Agent.

R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science

TL;DR

R&D-Agent tackles the data-science automation bottleneck by formalizing MLE into two phases—Research and Development—each supported by four and two modular components, respectively, and driven by two specialized agents (Researcher and Developer). The framework enables principled exploration of the large MLE design space, with a human-expert-inspired configuration achieving state-of-the-art performance on MLE-Bench under constrained resources. Comprehensive ablations demonstrate that both the phased architecture and the individual components (notably planning, adaptive exploration, memory context, coding workflow, and evaluation strategy) meaningfully contribute to gains. By open-sourcing the framework and optimal configurations, the work provides a practical, reusable platform for accelerating autonomous data science with broad applicability across Kaggle-like tasks and real-world data challenges.

Abstract

Recent advances in AI and ML have transformed data science, yet increasing complexity and expertise requirements continue to hinder progress. Although crowd-sourcing platforms alleviate some challenges, high-level machine learning engineering (MLE) tasks remain labor-intensive and iterative. We introduce R&D-Agent, a comprehensive, decoupled, and extensible framework that formalizes the MLE process. R&D-Agent defines the MLE workflow into two phases and six components, turning agent design for MLE from ad-hoc craftsmanship into a principled, testable process. Although several existing agents report promising gains on their chosen components, they can mostly be summarized as a partial optimization from our framework's simple baseline. Inspired by human experts, we designed efficient and effective agents within this framework that achieve state-of-the-art performance. Evaluated on MLE-Bench, the agent built on R&D-Agent ranks as the top-performing machine learning engineering agent, achieving 35.1% any medal rate, demonstrating the ability of the framework to speed up innovation and improve accuracy across a wide range of data science applications. We have open-sourced R&D-Agent on GitHub: https://github.com/microsoft/RD-Agent.

Paper Structure

This paper contains 42 sections, 3 equations, 5 figures, 10 tables, 2 algorithms.

Figures (5)

  • Figure 1: Agent performance on MLE-Bench. Stacked bars show any medal rates for Low==Lite (22 tasks), Medium (38 tasks), and High (15 tasks) complexity levels. The dashed line indicates overall performance (mean $\pm$ SEM). R&D-Agent achieves SOTA performance at 35.1 $\pm$ 0.4%. * indicates our re-evaluation of ML-Master within our environment.
  • Figure 2: Framework of R&D-Agent. R&D-Agent works in an iterative loop in which the Research Agent proposes ideas and the Development Agent implements them into runnable solutions to obtain feedback from data. By decoupling high-level research from low-level implementation, the framework efficiently explores the solution space through parallel exploration paths and iterative refinement, progressively converging on optimal solutions.
  • Figure 3: Development Phase Temporal Ablation. Medal acquisition rate (percentage of 75 competitions) over 12 hours reveals when and how each component contributes.
  • Figure 4: Performance comparison of R&D-Agent across different backend LLM configurations. The hybrid configuration achieves superior performance by leveraging specialized models for each phase.
  • Figure 5: Agent performance on MLE-Bench (Lite). Each value represents the mean performance across all benchmark tasks, with the value after "$\pm$" indicating SEM. For the AIRA agents, the reported value is 0 because the original paper did not provide explicit results.