Table of Contents
Fetching ...

Unified Mind Model: Reimagining Autonomous Agents in the LLM Era

Pengbo Hu, Xiang Ying

TL;DR

The paper addresses the challenge of building autonomous agents with human-like cognition by integrating large language models within a cognitive architecture. It introduces the Unified Mind Model (UMM), grounded in Global Workspace Theory, to orchestrate perception, planning, reasoning, tool use, learning, memory, reflection, and motivation, and presents MindOS as an agent-building engine enabling rapid creation of domain-specific agents via free-form instructions. The architecture combines a Specialist Module of parallel experts, a Central Processing Module (Global Workspace), and a Driver System (Background Context), with LLMs serving as world models and facilitators of cross-module communication. The work discusses implementation principles, practical operation, and future challenges, highlighting the potential impact of LLM-powered cognitive architectures on scalable, autonomous AI systems.

Abstract

Large language models (LLMs) have recently demonstrated remarkable capabilities across domains, tasks, and languages (e.g., ChatGPT and GPT-4), reviving the research of general autonomous agents with human-like cognitive abilities. Such human-level agents require semantic comprehension and instruction-following capabilities, which exactly fall into the strengths of LLMs. Although there have been several initial attempts to build human-level agents based on LLMs, the theoretical foundation remains a challenging open problem. In this paper, we propose a novel theoretical cognitive architecture, the Unified Mind Model (UMM), which offers guidance to facilitate the rapid creation of autonomous agents with human-level cognitive abilities. Specifically, our UMM starts with the global workspace theory and further leverage LLMs to enable the agent with various cognitive abilities, such as multi-modal perception, planning, reasoning, tool use, learning, memory, reflection and motivation. Building upon UMM, we then develop an agent-building engine, MindOS, which allows users to quickly create domain-/task-specific autonomous agents without any programming effort.

Unified Mind Model: Reimagining Autonomous Agents in the LLM Era

TL;DR

The paper addresses the challenge of building autonomous agents with human-like cognition by integrating large language models within a cognitive architecture. It introduces the Unified Mind Model (UMM), grounded in Global Workspace Theory, to orchestrate perception, planning, reasoning, tool use, learning, memory, reflection, and motivation, and presents MindOS as an agent-building engine enabling rapid creation of domain-specific agents via free-form instructions. The architecture combines a Specialist Module of parallel experts, a Central Processing Module (Global Workspace), and a Driver System (Background Context), with LLMs serving as world models and facilitators of cross-module communication. The work discusses implementation principles, practical operation, and future challenges, highlighting the potential impact of LLM-powered cognitive architectures on scalable, autonomous AI systems.

Abstract

Large language models (LLMs) have recently demonstrated remarkable capabilities across domains, tasks, and languages (e.g., ChatGPT and GPT-4), reviving the research of general autonomous agents with human-like cognitive abilities. Such human-level agents require semantic comprehension and instruction-following capabilities, which exactly fall into the strengths of LLMs. Although there have been several initial attempts to build human-level agents based on LLMs, the theoretical foundation remains a challenging open problem. In this paper, we propose a novel theoretical cognitive architecture, the Unified Mind Model (UMM), which offers guidance to facilitate the rapid creation of autonomous agents with human-level cognitive abilities. Specifically, our UMM starts with the global workspace theory and further leverage LLMs to enable the agent with various cognitive abilities, such as multi-modal perception, planning, reasoning, tool use, learning, memory, reflection and motivation. Building upon UMM, we then develop an agent-building engine, MindOS, which allows users to quickly create domain-/task-specific autonomous agents without any programming effort.

Paper Structure

This paper contains 29 sections, 4 figures.

Figures (4)

  • Figure 1: The Architecture of UMM. UMM is constructed around the Global Workspace Theory (GWT), which is organized into a hierarchical structure consisting of three distinct layers. The first layer is the Specialist module, which houses a variety of independent functional models. The second layer is the Central Processing module, which corresponds to the Global Workspace in GWT and governs the regulation and management of the Specialist module. The third layer is the Driver System, which corresponds to the Background Context in GWT and is responsible for dynamically adjusting the task objectives and information processing methodology. In addition, the UMM incorporates a language model that supports the implementation of various cognitive functions.
  • Figure 2: The Architecture of MindOS.
  • Figure 3: Information Processing Mode.
  • Figure 4: Three types of Architecture. (a) A standard cognitive architecture laird2017standard. (b) Architecture of Global Workspace Theory baars1993cognitive. (c) Lecun's Architecture lecun2022path.