Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution
Jiawei Du, Jinlong Wu, Yuzheng Chen, Yucheng Hu, Bing Li, Joey Tianyi Zhou
TL;DR
This work challenges the dominance of top-down agent design by proposing a bottom-up paradigm where agents acquire, refine, and share skills through trial-and-reasoning in open-ended environments. Grounded in a formal POMDP framework, skills are built from atomic actions, evaluated via implicit rewards, and evolved through augmentation, MCTS-guided invocation, and LLM-driven refinement, all without privileged APIs. The authors instantiate and test the approach in two visually grounded, API-free games (Slay the Spire and Civilization V), demonstrating emergent competence, robust skill reuse, and a scalable, environment-agnostic reasoning loop, albeit with higher exploration costs. The work highlights the potential of experience-driven skill evolution to complement traditional top-down workflows, paving the way for decentralized, continually improving agent libraries in complex real-world settings, while signaling important avenues for future improvements in efficiency, abstraction, and multi-agent coordination.
Abstract
Most LLM-based agent frameworks adopt a top-down philosophy: humans decompose tasks, define workflows, and assign agents to execute each step. While effective on benchmark-style tasks, such systems rely on designer updates and overlook agents' potential to learn from experience. Recently, Silver and Sutton(2025) envision a shift into a new era, where agents could progress from a stream of experiences. In this paper, we instantiate this vision of experience-driven learning by introducing a bottom-up agent paradigm that mirrors the human learning process. Agents acquire competence through a trial-and-reasoning mechanism-exploring, reflecting on outcomes, and abstracting skills over time. Once acquired, skills can be rapidly shared and extended, enabling continual evolution rather than static replication. As more agents are deployed, their diverse experiences accelerate this collective process, making bottom-up design especially suited for open-ended environments. We evaluate this paradigm in Slay the Spire and Civilization V, where agents perceive through raw visual inputs and act via mouse outputs, the same as human players. Using a unified, game-agnostic codebase without any game-specific prompts or privileged APIs, our bottom-up agents acquire skills entirely through autonomous interaction, demonstrating the potential of the bottom-up paradigm in complex, real-world environments. Our code is available at https://github.com/AngusDujw/Bottom-Up-Agent.
