Table of Contents
Fetching ...

AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search

Yu Li, Lehui Li, Zhihao Wu, Qingmin Liao, Jianye Hao, Kun Shao, Fengli Xu, Yong Li

TL;DR

AgentSwift tackles the inefficiency and cost of automated LLM agent design by formalizing a hierarchical search space that jointly optimizes agentic workflows and modular components (memory, tool use, planning), paired with a low-cost value model trained on a combinatorially crafted dataset. An uncertainty-guided hierarchical MCTS guides exploration, using recombination, mutation, and refinement to generate and evaluate candidate agents, with uncertainty informing selection to balance exploration and exploitation. Across seven benchmarks, AgentSwift yields consistent performance gains over hand-designed and prior automated methods, with strong model-agnosticity and faster discovery, while maintaining cost-efficiency. This framework holds promise as a practical, scalable launcher for rapidly identifying powerful, adaptable LLM agent architectures in diverse domains.

Abstract

Large language model (LLM) agents have demonstrated strong capabilities across diverse domains, yet automated agent design remains a significant challenge. Current automated agent design approaches are often constrained by limited search spaces that primarily optimize workflows but fail to integrate crucial human-designed components like memory, planning, and tool use. Furthermore, these methods are hampered by high evaluation costs, as evaluating even a single new agent on a benchmark can require tens of dollars. The difficulty of this exploration is further exacerbated by inefficient search strategies that struggle to navigate the large design space effectively, making the discovery of novel agents a slow and resource-intensive process. To address these challenges, we propose AgentSwift, a novel framework for automated agent design. We formalize a hierarchical search space that jointly models agentic workflow and composable functional components. This structure moves beyond optimizing workflows alone by co-optimizing functional components, which enables the discovery of more complex and effective agent architectures. To make exploration within this expansive space feasible, we mitigate high evaluation costs by training a value model on a high-quality dataset, generated via a novel strategy combining combinatorial coverage and balanced Bayesian sampling for low-cost evaluation. Guiding the entire process is a hierarchical MCTS strategy, which is informed by uncertainty to efficiently navigate the search space. Evaluated across a comprehensive set of seven benchmarks spanning embodied, math, web, tool, and game domains, AgentSwift discovers agents that achieve an average performance gain of 8.34\% over both existing automated agent search methods and manually designed agents. Our framework serves as a launchpad for researchers to rapidly discover powerful agent architectures.

AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search

TL;DR

AgentSwift tackles the inefficiency and cost of automated LLM agent design by formalizing a hierarchical search space that jointly optimizes agentic workflows and modular components (memory, tool use, planning), paired with a low-cost value model trained on a combinatorially crafted dataset. An uncertainty-guided hierarchical MCTS guides exploration, using recombination, mutation, and refinement to generate and evaluate candidate agents, with uncertainty informing selection to balance exploration and exploitation. Across seven benchmarks, AgentSwift yields consistent performance gains over hand-designed and prior automated methods, with strong model-agnosticity and faster discovery, while maintaining cost-efficiency. This framework holds promise as a practical, scalable launcher for rapidly identifying powerful, adaptable LLM agent architectures in diverse domains.

Abstract

Large language model (LLM) agents have demonstrated strong capabilities across diverse domains, yet automated agent design remains a significant challenge. Current automated agent design approaches are often constrained by limited search spaces that primarily optimize workflows but fail to integrate crucial human-designed components like memory, planning, and tool use. Furthermore, these methods are hampered by high evaluation costs, as evaluating even a single new agent on a benchmark can require tens of dollars. The difficulty of this exploration is further exacerbated by inefficient search strategies that struggle to navigate the large design space effectively, making the discovery of novel agents a slow and resource-intensive process. To address these challenges, we propose AgentSwift, a novel framework for automated agent design. We formalize a hierarchical search space that jointly models agentic workflow and composable functional components. This structure moves beyond optimizing workflows alone by co-optimizing functional components, which enables the discovery of more complex and effective agent architectures. To make exploration within this expansive space feasible, we mitigate high evaluation costs by training a value model on a high-quality dataset, generated via a novel strategy combining combinatorial coverage and balanced Bayesian sampling for low-cost evaluation. Guiding the entire process is a hierarchical MCTS strategy, which is informed by uncertainty to efficiently navigate the search space. Evaluated across a comprehensive set of seven benchmarks spanning embodied, math, web, tool, and game domains, AgentSwift discovers agents that achieve an average performance gain of 8.34\% over both existing automated agent search methods and manually designed agents. Our framework serves as a launchpad for researchers to rapidly discover powerful agent architectures.

Paper Structure

This paper contains 42 sections, 11 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of our framework. The framework integrates (a) hierarchical search space (b) uncertainty-guided MCTS with hierarchical expansion (c) value model training
  • Figure 2: Overview of dataset construction
  • Figure 3: AgentSwift search trajectory on Alfworld and M3ToolEval.
  • Figure 4: Left: search trajectory of different search strategies on Alfworld: AgentSwift, w/o uncertainty, and w/o MCTS. Right: search trajectory of different evaluate method on Alfworld: AgentSwift, gpt-4o prediction, and full evaluation.
  • Figure 5: Performance comparison on M3ToolEval under few-shot adaptation.
  • ...and 4 more figures