AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
Yuzhu Cai, Zexi Liu, Xinyu Zhu, Cheng Wang, Jiaao Chen, Hanrui Wang, Wei-Chen Wang, Di Jin, Siheng Chen
TL;DR
AceGRPO addresses the challenge of long-horizon autonomous Machine Learning Engineering by enabling an LLM to learn from its own trial-and-error through an evolving data buffer and a curriculum-guided adaptive sampling strategy. It reframes optimization as step-wise learning over a dynamically expanding task distribution and prioritizes informative states with Learnability Potential to maximize gradient signals under limited compute. The approach yields a 100% valid submission rate on MLE-Bench-Lite for Ace-30B, with medal and HumanRank performance approaching or surpassing larger frontier models, while maintaining strong open-source efficiency. This demonstrates sustained self-evolution and practical potential for deploying autonomous ML agents in iterative engineering tasks.
Abstract
Autonomous Machine Learning Engineering (MLE) requires agents to perform sustained, iterative optimization over long horizons. While recent LLM-based agents show promise, current prompt-based agents for MLE suffer from behavioral stagnation due to frozen parameters. Although Reinforcement Learning (RL) offers a remedy, applying it to MLE is hindered by prohibitive execution latency and inefficient data selection. Recognizing these challenges, we propose AceGRPO with two core components: (1) Evolving Data Buffer that continuously repurposes execution traces into reusable training tasks, and (2) Adaptive Sampling guided by a Learnability Potential function, which dynamically prioritizes tasks at the agent's learning frontier to maximize learning efficiency. Leveraging AceGRPO, our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of proprietary frontier models, and outperforms larger open-source baselines (e.g., DeepSeek-V3.2), demonstrating robust capability for sustained iterative optimization. Code is available at https://github.com/yuzhu-cai/AceGRPO.
