Table of Contents
Fetching ...

DynaSaur: Large Language Agents Beyond Predefined Actions

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, Ryan A. Rossi, Handong Zhao, Ruiyi Zhang, Puneet Mathur, Nedim Lipka, Yu Wang, Trung Bui, Franck Dernoncourt, Tianyi Zhou

TL;DR

DynaSaur reframes LLM agents to dynamically generate and compose actions as Python functions, overcoming the rigidity of fixed action sets. Actions are accumulated over time, enabling reuse and complex behavior through composition, and an action retrieval mechanism selects relevant generated functions. Empirical results across GAIA, MATH, TabMWP, AIME, and GPQA show substantial performance gains and robustness, with ablations confirming the contributions of action implementation, accumulation, and initial tooling. The framework maintains compatibility with human-designed tools and can incorporate external tools, highlighting practical impact for open-ended, real-world tasks while acknowledging safety considerations for code execution.

Abstract

Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly scoped environments, it presents two major challenges for real-world, open-ended scenarios: (1) it significantly restricts the planning and acting capabilities of LLM agents, and (2) it requires substantial human effort to enumerate and implement all possible actions, which is impractical in complex environments with a vast number of potential actions. To address these limitations, we propose an LLM agent framework that can dynamically create and compose actions as needed. In this framework, the agent interacts with its environment by generating and executing programs written in a general-purpose programming language. Moreover, generated actions are accumulated over time for future reuse. Our extensive experiments across multiple benchmarks show that this framework significantly improves flexibility and outperforms prior methods that rely on a fixed action set. Notably, it enables LLM agents to adapt and recover in scenarios where predefined actions are insufficient or fail due to unforeseen edge cases. Our code can be found in https://github.com/adobe-research/dynasaur.

DynaSaur: Large Language Agents Beyond Predefined Actions

TL;DR

DynaSaur reframes LLM agents to dynamically generate and compose actions as Python functions, overcoming the rigidity of fixed action sets. Actions are accumulated over time, enabling reuse and complex behavior through composition, and an action retrieval mechanism selects relevant generated functions. Empirical results across GAIA, MATH, TabMWP, AIME, and GPQA show substantial performance gains and robustness, with ablations confirming the contributions of action implementation, accumulation, and initial tooling. The framework maintains compatibility with human-designed tools and can incorporate external tools, highlighting practical impact for open-ended, real-world tasks while acknowledging safety considerations for code execution.

Abstract

Existing LLM agent systems typically select actions from a fixed and predefined set at every step. While this approach is effective in closed, narrowly scoped environments, it presents two major challenges for real-world, open-ended scenarios: (1) it significantly restricts the planning and acting capabilities of LLM agents, and (2) it requires substantial human effort to enumerate and implement all possible actions, which is impractical in complex environments with a vast number of potential actions. To address these limitations, we propose an LLM agent framework that can dynamically create and compose actions as needed. In this framework, the agent interacts with its environment by generating and executing programs written in a general-purpose programming language. Moreover, generated actions are accumulated over time for future reuse. Our extensive experiments across multiple benchmarks show that this framework significantly improves flexibility and outperforms prior methods that rely on a fixed action set. Notably, it enables LLM agents to adapt and recover in scenarios where predefined actions are insufficient or fail due to unforeseen edge cases. Our code can be found in https://github.com/adobe-research/dynasaur.

Paper Structure

This paper contains 34 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Illustration of the DynaSaur agent framework. The agent $\pi$ receives a task $t$ and optionally a set of human-designed actions $\mathcal{A}^u$. It then interacts with an environment $\mathcal{E}$ by proposing an action $a \in \mathcal{A}$, implemented as a Python function. The action is executed in an IPython kernel, which may interface with the operating system, the internet, or the action retriever as needed. The result, either the output of the function or an error message, is returned to the agent as an observation $o$. Generated actions that execute successfully are accumulated into $\mathcal{A}^g$.
  • Figure 2: Distribution of error types in tasks where agent A (without action implementation) answers incorrectly, while agent B (with action implementation) answers correctly.
  • Figure 3: Mean coverage over the validation set as the number of actions increases. The red dashed line marks the point where human-designed actions are added.
  • Figure 4: Categories of actions accumulated during evaluation on GAIA validation set.
  • Figure 5: A case study demonstrates the difference in problem-solving flexibility between Agent A (a variant of DynaSaur without action implementation) and Agent B (the proposed agent framework). Both agents begin with the same initial step, but only Agent B, equipped with the ability to implement its own actions, successfully completes the task. Due to space constraints, the first step taken by Agent B is not shown.
  • ...and 5 more figures