Table of Contents
Fetching ...

AgentKit: Structured LLM Reasoning with Dynamic Graphs

Yue Wu, Yewen Fan, So Yeon Min, Shrimai Prabhumoye, Stephen McAleer, Yonatan Bisk, Ruslan Salakhutdinov, Yuanzhi Li, Tom Mitchell

TL;DR

AgentKit presents a node-based prompting framework that represents an end-to-end AI agent as a dynamic DAG of natural-language subtasks. Each node encapsulates a subtask with a prompt and dependencies, processed via a Compose step, an LLM query, and an optional After-query, all sharing a central database. The framework supports on-the-fly graph modifications, enabling conditional branching and loops, and leverages topological traversal (Kahn’s algorithm) with safeguards to maintain reliability. Through Crafter and WebShop benchmarks, AgentKit demonstrates state-of-the-art performance and cost-efficient operation, powered by hierarchical planning, short- and long-term reflection, and continuous knowledge integration. The work argues for natural-language based coding of agents, broadening accessibility while preserving sophisticated reasoning capabilities for complex, real-world tasks.

Abstract

We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit

AgentKit: Structured LLM Reasoning with Dynamic Graphs

TL;DR

AgentKit presents a node-based prompting framework that represents an end-to-end AI agent as a dynamic DAG of natural-language subtasks. Each node encapsulates a subtask with a prompt and dependencies, processed via a Compose step, an LLM query, and an optional After-query, all sharing a central database. The framework supports on-the-fly graph modifications, enabling conditional branching and loops, and leverages topological traversal (Kahn’s algorithm) with safeguards to maintain reliability. Through Crafter and WebShop benchmarks, AgentKit demonstrates state-of-the-art performance and cost-efficient operation, powered by hierarchical planning, short- and long-term reflection, and continuous knowledge integration. The work argues for natural-language based coding of agents, broadening accessibility while preserving sophisticated reasoning capabilities for complex, real-world tasks.

Abstract

We propose an intuitive LLM prompting framework (AgentKit) for multifunctional agents. AgentKit offers a unified framework for explicitly constructing a complex "thought process" from simple natural language prompts. The basic building block in AgentKit is a node, containing a natural language prompt for a specific subtask. The user then puts together chains of nodes, like stacking LEGO pieces. The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". For example, for the task of writing a paper, one may start with the thought process of 1) identify a core message, 2) identify prior research gaps, etc. The nodes in AgentKit can be designed and combined in different ways to implement multiple advanced capabilities including on-the-fly hierarchical planning, reflection, and learning from interactions. In addition, due to the modular nature and the intuitive design to simulate explicit human thought process, a basic agent could be implemented as simple as a list of prompts for the subtasks and therefore could be designed and tuned by someone without any programming experience. Quantitatively, we show that agents designed through AgentKit achieve SOTA performance on WebShop and Crafter. These advances underscore AgentKit's potential in making LLM agents effective and accessible for a wider range of applications. https://github.com/holmeswww/AgentKit
Paper Structure (39 sections, 3 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 39 sections, 3 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: A user breaks down a task into subtasks (nodes) representing a "thought process" and creates prompts for the subtasks (nodes). Subtasks (nodes) in AgentKit can be designed and assembled in different ways to achieve diverse functionalities, similar to LEGO pieces.
  • Figure 2: Each node in AgentKit takes outputs from its dependencies and outputs a string to complete a predefined subtask. The orange components (After-query) are optional and can be further customized with minimal programming through the AgentKit API. Left: The evaluation process inside a node consists of compose and after-query. Right: Nodes can be dynamically added / removed during the inference time. For example, the after-query operation of $n_7$ adds a conditional node $n_{+}/n_{-}$ based on a yes/no answer from the LLM to the node query. This induces conditional branching.
  • Figure 3: Node names are abbreviated for space. (a) At every step in the game, three summary nodes (green) $n_{\text{s-obs}}$, $n_{\text{s-plan}}$, $n_{\text{s-action}}$ summarize the observation, plan, and action of the current step. (b) At step $T$, all planner nodes (blue) take $o_{T-1}$,$o_{T}$ and manual $\mathcal{I}$ as input, and output 3 subgoals and a skill $s^T$. $n_{\text{reflect}}$ reflects on the summary of the 25 most recent steps, and $n_{\text{challenge}},n_{\text{gate}}$ determines whether the subgoals for $(T-1)$ are carried over or updated. (c) Every 3 steps under skill $s^T$, ($n_{\text{feed}}$purple) reflects on all gameplay history under $s^T$ and generates a skill specific feedback for the planner (b). (d) Every step $T$, $n_{\text{kb-add}}$ (gray) examines $o_{T-1}$,$o_{T}$ and $\mathcal{I}$ to identify new information from $L_{unk}$. $n_{\text{unknown}}$ adds to $L_{unk}$ by identifying missing information from $\mathcal{I}$ for the current sub-goal.
  • Figure 4: Left three columns: an example trajectory in Crafter. Different nodes on planning, reflection, feedback, knolwedge discovery work together to complete the first 11 steps and successfully crafting the table. Through environment interactions and error identification/correction, the agent discovered two pieces of information regarding "wood per Do action" and "table wood consumption", originally omitted by the instructions crafter. Right column: the end-of-game distribution of all actions (classified into categories of Move, Do --- Interact, Craft) taken by the agent, for each skill in the skill library. The action distribution aligns well with human expectations based on skill names.
  • Figure 5: Screenshot of our command line interface (CLI) to produce a graph without coding.