Table of Contents
Fetching ...

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang

TL;DR

CoMind tackles the challenge of community-driven ML engineering by introducing MLE-Live, a live evaluation framework that simulates Kaggle-like public knowledge exchanges, and a five-role multi-agent system that iteratively leverages external knowledge. By coordinating a Central Coordinator, Analyzer, Idea Proposer, Coding Agent, and Evaluator within a simulated community, CoMind achieves state-of-the-art medal performance on retrospective tasks and strong live competition standings, outpacing most human competitors on eight ongoing Kaggle contests. The framework demonstrates that continuous knowledge accumulation and collaborative exploration can push ML engineering solutions beyond isolated, single-agent approaches, offering a scalable paradigm for research automation. The work suggests broad applicability to real-world scientific and engineering domains while outlining limitations and directions for future expansion and deeper analysis.

Abstract

Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, an multi-agent system designed to actively integrate external knowledge. CoMind employs an iterative parallel exploration mechanism, developing multiple solutions simultaneously to balance exploratory breadth with implementation depth. On 75 past Kaggle competitions within our MLE-Live framework, CoMind achieves a 36% medal rate, establishing a new state of the art. Critically, when deployed in eight live, ongoing competitions, CoMind outperforms 92.6% of human competitors on average, placing in the top 5% on three official leaderboards and the top 1% on one.

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

TL;DR

CoMind tackles the challenge of community-driven ML engineering by introducing MLE-Live, a live evaluation framework that simulates Kaggle-like public knowledge exchanges, and a five-role multi-agent system that iteratively leverages external knowledge. By coordinating a Central Coordinator, Analyzer, Idea Proposer, Coding Agent, and Evaluator within a simulated community, CoMind achieves state-of-the-art medal performance on retrospective tasks and strong live competition standings, outpacing most human competitors on eight ongoing Kaggle contests. The framework demonstrates that continuous knowledge accumulation and collaborative exploration can push ML engineering solutions beyond isolated, single-agent approaches, offering a scalable paradigm for research automation. The work suggests broad applicability to real-world scientific and engineering domains while outlining limitations and directions for future expansion and deeper analysis.

Abstract

Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, an multi-agent system designed to actively integrate external knowledge. CoMind employs an iterative parallel exploration mechanism, developing multiple solutions simultaneously to balance exploratory breadth with implementation depth. On 75 past Kaggle competitions within our MLE-Live framework, CoMind achieves a 36% medal rate, establishing a new state of the art. Critically, when deployed in eight live, ongoing competitions, CoMind outperforms 92.6% of human competitors on average, placing in the top 5% on three official leaderboards and the top 1% on one.

Paper Structure

This paper contains 89 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Left: CoMind's win rates on eight ongoing Kaggle competitions compared with the public best entry. Right:Any Medal results on 75 MLE-Bench competitions grouped by task difficulty levels. CoMind achieves state-of-the-art performance on MLE-Bench compared to strong baselines.
  • Figure 2: Overview of CoMind. Specialized agents (Coordinator, Analyzer, Idea Proposer, Coding Agent, Evaluator) interact with a simulated Kaggle community of kernels, datasets, and discussions.
  • Figure 3: Left: Score distributions across participants in eight ongoing Kaggle competitions. Each curve shows the relationship between leaderboard rank (x-axis, inverted) and competition score (y-axis). Vertical lines indicate CoMind's position (red) and public best performance (yellow). Right: Results on eight ongoing Kaggle competitions. Reported are leaderboard rank, total teams, and percentile rank (Top %, where lower means better standing).
  • Figure 4: Performance of CoMind and other baselines on 20 competitions from MLE-Bench-Lite.Valid Submission is the ratio of submissions meeting format requirements and validation criteria. Win Rate is the percentage of human competitors outperformed by the agent. Any Medal, is the proportion of competitions where the agent earned Gold, Silver or Bronze medals. Above Median is the fraction of competitions where the agent’s score strictly exceeded the median human competitor.
  • Figure 5: Win rate over time. CoMind sustains improvement while baselines plateau.
  • ...and 4 more figures