Table of Contents
Fetching ...

The FM Agent

Annan Li, Chufan Wu, Zengle Ge, Yee Hin Chong, Zhinan Hou, Lizhe Cao, Cheng Ju, Jianmin Wu, Huaiming Li, Haobo Zhang, Shenghao Feng, Mo Zhao, Fengzhi Qiu, Rui Yang, Mengmeng Zhang, Wenyi Zhu, Yingying Sun, Quan Sun, Shunhao Yan, Danyu Liu, Dawei Yin, Dou Shen

TL;DR

The paper introduces FM Agent, a general-purpose multi-agent framework that unifies LLM-based reasoning with large-scale evolutionary search to automate complex discovery across ML, CO, kernel optimization, and mathematics. Its four innovations—cold-start initialization, adaptive diversity-driven sampling, domain-specific evaluators, and a Ray-based distributed infrastructure—enable autonomous, scalable, self-improving problem solving. Across ALE-Bench, MLE-Bench, and KernelBench, FM Agent achieves state-of-the-art results and substantial speedups, with new SOTA demonstrated in several mathematical problems. The framework promises broad practical impact by accelerating industrial R&D workflows and enabling autonomous scientific discovery with reduced human intervention.

Abstract

Large language models (LLMs) are catalyzing the development of autonomous AI research agents for scientific and engineering discovery. We present FM Agent, a novel and general-purpose multi-agent framework that leverages a synergistic combination of LLM-based reasoning and large-scale evolutionary search to address complex real-world challenges. The core of FM Agent integrates several key innovations: 1) a cold-start initialization phase incorporating expert guidance, 2) a novel evolutionary sampling strategy for iterative optimization, 3) domain-specific evaluators that combine correctness, effectiveness, and LLM-supervised feedback, and 4) a distributed, asynchronous execution infrastructure built on Ray. Demonstrating broad applicability, our system has been evaluated across diverse domains, including operations research, machine learning, GPU kernel optimization, and classical mathematical problems. FM Agent reaches state-of-the-art results autonomously, without human interpretation or tuning -- 1976.3 on ALE-Bench (+5.2\%), 43.56\% on MLE-Bench (+4.0pp), up to 20x speedups on KernelBench, and establishes new state-of-the-art(SOTA) results on several classical mathematical problems. Beyond academic benchmarks, FM Agent shows considerable promise for both large-scale enterprise R\&D workflows and fundamental scientific research, where it can accelerate innovation, automate complex discovery processes, and deliver substantial engineering and scientific advances with broader societal impact.

The FM Agent

TL;DR

The paper introduces FM Agent, a general-purpose multi-agent framework that unifies LLM-based reasoning with large-scale evolutionary search to automate complex discovery across ML, CO, kernel optimization, and mathematics. Its four innovations—cold-start initialization, adaptive diversity-driven sampling, domain-specific evaluators, and a Ray-based distributed infrastructure—enable autonomous, scalable, self-improving problem solving. Across ALE-Bench, MLE-Bench, and KernelBench, FM Agent achieves state-of-the-art results and substantial speedups, with new SOTA demonstrated in several mathematical problems. The framework promises broad practical impact by accelerating industrial R&D workflows and enabling autonomous scientific discovery with reduced human intervention.

Abstract

Large language models (LLMs) are catalyzing the development of autonomous AI research agents for scientific and engineering discovery. We present FM Agent, a novel and general-purpose multi-agent framework that leverages a synergistic combination of LLM-based reasoning and large-scale evolutionary search to address complex real-world challenges. The core of FM Agent integrates several key innovations: 1) a cold-start initialization phase incorporating expert guidance, 2) a novel evolutionary sampling strategy for iterative optimization, 3) domain-specific evaluators that combine correctness, effectiveness, and LLM-supervised feedback, and 4) a distributed, asynchronous execution infrastructure built on Ray. Demonstrating broad applicability, our system has been evaluated across diverse domains, including operations research, machine learning, GPU kernel optimization, and classical mathematical problems. FM Agent reaches state-of-the-art results autonomously, without human interpretation or tuning -- 1976.3 on ALE-Bench (+5.2\%), 43.56\% on MLE-Bench (+4.0pp), up to 20x speedups on KernelBench, and establishes new state-of-the-art(SOTA) results on several classical mathematical problems. Beyond academic benchmarks, FM Agent shows considerable promise for both large-scale enterprise R\&D workflows and fundamental scientific research, where it can accelerate innovation, automate complex discovery processes, and deliver substantial engineering and scientific advances with broader societal impact.

Paper Structure

This paper contains 26 sections, 5 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 2: Performance of Agents on MLE-Bench: Medal Rate (%), evaluating FM Agent across real-world machine learning tasks sourced from Kaggle competitions.
  • Figure 3: Performance of Agents on the ALE-Bench Lite, denoting the SOTA capability of FM Agent in tackling challenging heuristic-driven tasks from AtCoder Completion.
  • Figure 4: Framework of FM Agent with Cold Start Stage and Evolve Stage, both account for the final performance.
  • Figure 5: Architecture of the Large-Scale Distributed Evolutionary Cluster.
  • Figure 6: Comparison of speedup achieved relative to torch.compile. The dashed line at 1 indicates parity with torch.compile.
  • ...and 6 more figures