Table of Contents
Fetching ...

Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution

Jiawei Du, Jinlong Wu, Yuzheng Chen, Yucheng Hu, Bing Li, Joey Tianyi Zhou

TL;DR

This work challenges the dominance of top-down agent design by proposing a bottom-up paradigm where agents acquire, refine, and share skills through trial-and-reasoning in open-ended environments. Grounded in a formal POMDP framework, skills are built from atomic actions, evaluated via implicit rewards, and evolved through augmentation, MCTS-guided invocation, and LLM-driven refinement, all without privileged APIs. The authors instantiate and test the approach in two visually grounded, API-free games (Slay the Spire and Civilization V), demonstrating emergent competence, robust skill reuse, and a scalable, environment-agnostic reasoning loop, albeit with higher exploration costs. The work highlights the potential of experience-driven skill evolution to complement traditional top-down workflows, paving the way for decentralized, continually improving agent libraries in complex real-world settings, while signaling important avenues for future improvements in efficiency, abstraction, and multi-agent coordination.

Abstract

Most LLM-based agent frameworks adopt a top-down philosophy: humans decompose tasks, define workflows, and assign agents to execute each step. While effective on benchmark-style tasks, such systems rely on designer updates and overlook agents' potential to learn from experience. Recently, Silver and Sutton(2025) envision a shift into a new era, where agents could progress from a stream of experiences. In this paper, we instantiate this vision of experience-driven learning by introducing a bottom-up agent paradigm that mirrors the human learning process. Agents acquire competence through a trial-and-reasoning mechanism-exploring, reflecting on outcomes, and abstracting skills over time. Once acquired, skills can be rapidly shared and extended, enabling continual evolution rather than static replication. As more agents are deployed, their diverse experiences accelerate this collective process, making bottom-up design especially suited for open-ended environments. We evaluate this paradigm in Slay the Spire and Civilization V, where agents perceive through raw visual inputs and act via mouse outputs, the same as human players. Using a unified, game-agnostic codebase without any game-specific prompts or privileged APIs, our bottom-up agents acquire skills entirely through autonomous interaction, demonstrating the potential of the bottom-up paradigm in complex, real-world environments. Our code is available at https://github.com/AngusDujw/Bottom-Up-Agent.

Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution

TL;DR

This work challenges the dominance of top-down agent design by proposing a bottom-up paradigm where agents acquire, refine, and share skills through trial-and-reasoning in open-ended environments. Grounded in a formal POMDP framework, skills are built from atomic actions, evaluated via implicit rewards, and evolved through augmentation, MCTS-guided invocation, and LLM-driven refinement, all without privileged APIs. The authors instantiate and test the approach in two visually grounded, API-free games (Slay the Spire and Civilization V), demonstrating emergent competence, robust skill reuse, and a scalable, environment-agnostic reasoning loop, albeit with higher exploration costs. The work highlights the potential of experience-driven skill evolution to complement traditional top-down workflows, paving the way for decentralized, continually improving agent libraries in complex real-world settings, while signaling important avenues for future improvements in efficiency, abstraction, and multi-agent coordination.

Abstract

Most LLM-based agent frameworks adopt a top-down philosophy: humans decompose tasks, define workflows, and assign agents to execute each step. While effective on benchmark-style tasks, such systems rely on designer updates and overlook agents' potential to learn from experience. Recently, Silver and Sutton(2025) envision a shift into a new era, where agents could progress from a stream of experiences. In this paper, we instantiate this vision of experience-driven learning by introducing a bottom-up agent paradigm that mirrors the human learning process. Agents acquire competence through a trial-and-reasoning mechanism-exploring, reflecting on outcomes, and abstracting skills over time. Once acquired, skills can be rapidly shared and extended, enabling continual evolution rather than static replication. As more agents are deployed, their diverse experiences accelerate this collective process, making bottom-up design especially suited for open-ended environments. We evaluate this paradigm in Slay the Spire and Civilization V, where agents perceive through raw visual inputs and act via mouse outputs, the same as human players. Using a unified, game-agnostic codebase without any game-specific prompts or privileged APIs, our bottom-up agents acquire skills entirely through autonomous interaction, demonstrating the potential of the bottom-up paradigm in complex, real-world environments. Our code is available at https://github.com/AngusDujw/Bottom-Up-Agent.

Paper Structure

This paper contains 16 sections, 4 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Two paradigms of agent design. Most existing agent frameworks can be categorized as Top-down agents, which rely on pre-engineered architectures: they begin with high-level goals, decompose them into subtasks, and execute workflows using task-specific APIs and tools. In contrast, we propose Bottom-Up agents to function as explorers: starting from zero prior knowledge, they gradually acquire skills through trial-and-reasoning, evolving autonomously via implicit reward inferred from environmental change.
  • Figure 2: Left: The bottom-up agent operates solely on raw visual input and simulates low-level mouse and keyboard actions. Without explicit rewards, it learns and refines skills based on implicit signals like visual changes or game progression. Right: Game progression measured by Civilization V’s tech tree and visual changes. Our bottom-up agent (blue) outperforms all baselines, including those with task-related priors.
  • Figure 3: Overview of Bottom-Up Skill Evolution. The agent begins with no predefined skills and gradually builds its library $\mathcal{S}$ through interaction. Left: New skills are incrementally composed by extending existing routines with atomic actions. Middle: Skills are evaluated by a visual-language model (VLM) comparing pre- and post-execution states; ineffective ones are refined or discarded via LLM reasoning. Right: At each timestep, a candidate set $\mathbb{S}_t$ is selected based on the current state $x_t$ and evaluated via Monte Carlo Tree Search (MCTS) mcts to choose the most promising skill. All components operate under a unified reasoning framework, without privileged APIs, allowing agents to acquire competence purely from experience.
  • Figure 4: Analysis of skill evolution and reuse. (a) Skill library size increases over time through augmentation (+) and pruning (–). (b)Top-10 most frequently invoked skills in Slay the Spire. (c) Examples of compositional skill inheritance across environments, showing how higher-level routines are built from atomic actions.
  • Figure 5: Prompting and execution visualization of the bottom-up agent. (a) Environment-agnostic prompts used for skill augmentation and invocation, enabling reasoning without access to game-specific APIs. (b) We design a GUI to visualized execution state of the agent during gameplay, showing candidate actions, selected goal, reasoning metadata and the corresponding skill plan tree.
  • ...and 3 more figures