Table of Contents
Fetching ...

KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems

Stepan Kulibaba, Artem Dzhalilov, Roman Pakhomov, Oleg Svidchenko, Alexander Gasnikov, Aleksei Shpilman

TL;DR

KompeteAI tackles two core problems in LLM-driven AutoML: limited exploration and expensive execution cycles. It introduces a stage-decomposed, multi-agent framework with adding and merging operators, Retrieval-Augmented Generation, a predictive scoring model, and accelerated debugging to enable rapid, diverse pipeline generation. Empirical results show state-of-the-art performance on MLE-Bench Lite and Kompete-bench, along with significant acceleration in evaluation throughput. The work also reveals biases in existing benchmarks and proposes Kompete-bench to provide a fairer, real-world-inspired testbed, underscoring the practical impact of automated, knowledge-augmented, and efficiently validated AutoML pipelines.

Abstract

Recent Large Language Model (LLM)-based AutoML systems demonstrate impressive capabilities but face significant limitations such as constrained exploration strategies and a severe execution bottleneck. Exploration is hindered by one-shot methods lacking diversity and Monte Carlo Tree Search (MCTS) approaches that fail to recombine strong partial solutions. The execution bottleneck arises from lengthy code validation cycles that stifle iterative refinement. To overcome these challenges, we introduce KompeteAI, a novel AutoML framework with dynamic solution space exploration. Unlike previous MCTS methods that treat ideas in isolation, KompeteAI introduces a merging stage that composes top candidates. We further expand the hypothesis space by integrating Retrieval-Augmented Generation (RAG), sourcing ideas from Kaggle notebooks and arXiv papers to incorporate real-world strategies. KompeteAI also addresses the execution bottleneck via a predictive scoring model and an accelerated debugging method, assessing solution potential using early stage metrics to avoid costly full-code execution. This approach accelerates pipeline evaluation 6.9 times. KompeteAI outperforms leading methods (e.g., RD-agent, AIDE, and Ml-Master) by an average of 3\% on the primary AutoML benchmark, MLE-Bench. Additionally, we propose Kompete-bench to address limitations in MLE-Bench, where KompeteAI also achieves state-of-the-art results

KompeteAI: Accelerated Autonomous Multi-Agent System for End-to-End Pipeline Generation for Machine Learning Problems

TL;DR

KompeteAI tackles two core problems in LLM-driven AutoML: limited exploration and expensive execution cycles. It introduces a stage-decomposed, multi-agent framework with adding and merging operators, Retrieval-Augmented Generation, a predictive scoring model, and accelerated debugging to enable rapid, diverse pipeline generation. Empirical results show state-of-the-art performance on MLE-Bench Lite and Kompete-bench, along with significant acceleration in evaluation throughput. The work also reveals biases in existing benchmarks and proposes Kompete-bench to provide a fairer, real-world-inspired testbed, underscoring the practical impact of automated, knowledge-augmented, and efficiently validated AutoML pipelines.

Abstract

Recent Large Language Model (LLM)-based AutoML systems demonstrate impressive capabilities but face significant limitations such as constrained exploration strategies and a severe execution bottleneck. Exploration is hindered by one-shot methods lacking diversity and Monte Carlo Tree Search (MCTS) approaches that fail to recombine strong partial solutions. The execution bottleneck arises from lengthy code validation cycles that stifle iterative refinement. To overcome these challenges, we introduce KompeteAI, a novel AutoML framework with dynamic solution space exploration. Unlike previous MCTS methods that treat ideas in isolation, KompeteAI introduces a merging stage that composes top candidates. We further expand the hypothesis space by integrating Retrieval-Augmented Generation (RAG), sourcing ideas from Kaggle notebooks and arXiv papers to incorporate real-world strategies. KompeteAI also addresses the execution bottleneck via a predictive scoring model and an accelerated debugging method, assessing solution potential using early stage metrics to avoid costly full-code execution. This approach accelerates pipeline evaluation 6.9 times. KompeteAI outperforms leading methods (e.g., RD-agent, AIDE, and Ml-Master) by an average of 3\% on the primary AutoML benchmark, MLE-Bench. Additionally, we propose Kompete-bench to address limitations in MLE-Bench, where KompeteAI also achieves state-of-the-art results

Paper Structure

This paper contains 30 sections, 3 equations, 5 figures, 12 tables, 2 algorithms.

Figures (5)

  • Figure 1: The KompeteAI AutoML pipeline.
  • Figure 2: Comparison of our pipeline with AIDE, RD-agent and ML-Master on Contemporary and MLE-Bench parts of Kompete-bench. All systems use gemini-2.5-flash as the underlying LLM, except for ML-Master, which uses deepseek-r1. Each was run 3 times with different seeds, results are averaged, and each run was limited to 6 hours.
  • Figure A3: End dates for competitions presented in Kompete-Bench
  • Figure A4: Comparison between predicted scores and actual validation scores after min--max normalization within each competition. Each point corresponds to a solution from a distinct competition. The solid black diagonal represents the ideal case where predicted and actual scores are perfectly aligned.
  • Figure A5: Average outcomes of the debugging process. The chart illustrates the proportion of successfully integrated ideas, categorized by their target AutoML stage — Model Training (MT) or Feature Engineering (FE) — versus the proportion of ideas that were ultimately discarded as "Failed".