Unified Mind Model: Reimagining Autonomous Agents in the LLM Era
Pengbo Hu, Xiang Ying
TL;DR
The paper addresses the challenge of building autonomous agents with human-like cognition by integrating large language models within a cognitive architecture. It introduces the Unified Mind Model (UMM), grounded in Global Workspace Theory, to orchestrate perception, planning, reasoning, tool use, learning, memory, reflection, and motivation, and presents MindOS as an agent-building engine enabling rapid creation of domain-specific agents via free-form instructions. The architecture combines a Specialist Module of parallel experts, a Central Processing Module (Global Workspace), and a Driver System (Background Context), with LLMs serving as world models and facilitators of cross-module communication. The work discusses implementation principles, practical operation, and future challenges, highlighting the potential impact of LLM-powered cognitive architectures on scalable, autonomous AI systems.
Abstract
Large language models (LLMs) have recently demonstrated remarkable capabilities across domains, tasks, and languages (e.g., ChatGPT and GPT-4), reviving the research of general autonomous agents with human-like cognitive abilities. Such human-level agents require semantic comprehension and instruction-following capabilities, which exactly fall into the strengths of LLMs. Although there have been several initial attempts to build human-level agents based on LLMs, the theoretical foundation remains a challenging open problem. In this paper, we propose a novel theoretical cognitive architecture, the Unified Mind Model (UMM), which offers guidance to facilitate the rapid creation of autonomous agents with human-level cognitive abilities. Specifically, our UMM starts with the global workspace theory and further leverage LLMs to enable the agent with various cognitive abilities, such as multi-modal perception, planning, reasoning, tool use, learning, memory, reflection and motivation. Building upon UMM, we then develop an agent-building engine, MindOS, which allows users to quickly create domain-/task-specific autonomous agents without any programming effort.
