Table of Contents
Fetching ...

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang Bian

TL;DR

This work introduces Gome, an MLE agent that operationalizes gradient-based optimization, a form of gradient-free optimization that uses scalar validation scores to rank candidates in reasoning-oriented LLMs.

Abstract

LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce \textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. \textsc{Gome} maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, \textsc{Gome} achieves a state-of-the-art 35.1\% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models reveal a critical crossover: with weaker models, tree search retains advantages by compensating for unreliable reasoning through exhaustive exploration; as reasoning capability strengthens, gradient-based optimization progressively outperforms, with the gap widening at frontier-tier models. Given the rapid advancement of reasoning-oriented LLMs, this positions gradient-based optimization as an increasingly favorable paradigm. We release our codebase and GPT-5 traces.

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

TL;DR

This work introduces Gome, an MLE agent that operationalizes gradient-based optimization, a form of gradient-free optimization that uses scalar validation scores to rank candidates in reasoning-oriented LLMs.

Abstract

LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce \textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. \textsc{Gome} maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, \textsc{Gome} achieves a state-of-the-art 35.1\% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models reveal a critical crossover: with weaker models, tree search retains advantages by compensating for unreliable reasoning through exhaustive exploration; as reasoning capability strengthens, gradient-based optimization progressively outperforms, with the gap widening at frontier-tier models. Given the rapid advancement of reasoning-oriented LLMs, this positions gradient-based optimization as an increasingly favorable paradigm. We release our codebase and GPT-5 traces.
Paper Structure (74 sections, 20 equations, 7 figures, 17 tables)

This paper contains 74 sections, 20 equations, 7 figures, 17 tables.

Figures (7)

  • Figure 1: Comparison of search-based (gradient-free) and gradient-based optimization paradigms. (a) Tree search uses scalar scores to decide which branch to expand. (b) Gradient-based optimization uses reasoning to decide how to update; stronger reasoning yields more accurate gradient signals.
  • Figure 2: Overview of the Gome framework. Multiple traces optimize in parallel, synchronizing through a global shared success memory $\mathcal{M}$. Each trace iteratively executes solutions, validates improvements, updates shared memory, and reasons over local and global feedback to generate the next hypothesis. Right panels illustrate the gradient-based optimization analogy and key module details.
  • Figure 3: Gome performance scales with model reasoning capability. As base model capability increases from Efficiency to Frontier tiers, Gome's advantage over search-based baselines widens significantly.
  • Figure 4: Optimization dynamics across model capability tiers. Gome (solid red) exhibits rapid early convergence, while MCTS (dashed blue) starts slower but catches up on weaker models. On Frontier-tier models, Gome maintains its advantage throughout.
  • Figure 5: Performance gap between Gome and the best baseline across model capability tiers. Negative values (red) indicate Gome underperforms; positive values (green) indicate Gome outperforms. The crossover occurs at the Advanced tier, with the gap widening progressively in the Frontier tier.
  • ...and 2 more figures