Table of Contents
Fetching ...

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

Hongming Zhang, Xiaoman Pan, Hongwei Wang, Kaixin Ma, Wenhao Yu, Dong Yu

TL;DR

Cognitive Kernel addresses the need for autonomous autopilots by moving from environment-centric prompts to a model-centric policy that actively perceives state information. It introduces a triad of kernels (reasoning, perception, memory) and a two-stage training regimen on an open-source Llama3 backbone, enabling end-to-end task completion from web search to local file processing. Across three use cases—the real-time information management, private information management, and long-term memory management—the system achieves better or comparable performance to leading closed-source agents, with clear gains when using an adapted backbone. By releasing the code and weights, the work offers a practical, open benchmark for the development of generalist autopilots and emphasizes the synergy between policy design and system architecture.

Abstract

We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots. Unlike copilot systems, which primarily rely on users to provide essential state information (e.g., task descriptions) and assist users by answering questions or auto-completing contents, autopilot systems must complete tasks from start to finish independently, which requires the system to acquire the state information from the environments actively. To achieve this, an autopilot system should be capable of understanding user intents, actively gathering necessary information from various real-world sources, and making wise decisions. Cognitive Kernel adopts a model-centric design. In our implementation, the central policy model (a fine-tuned LLM) initiates interactions with the environment using a combination of atomic actions, such as opening files, clicking buttons, saving intermediate results to memory, or calling the LLM itself. This differs from the widely used environment-centric design, where a task-specific environment with predefined actions is fixed, and the policy model is limited to selecting the correct action from a given set of options. Our design facilitates seamless information flow across various sources and provides greater flexibility. We evaluate our system in three use cases: real-time information management, private information management, and long-term memory management. The results demonstrate that Cognitive Kernel achieves better or comparable performance to other closed-source systems in these scenarios. Cognitive Kernel is fully dockerized, ensuring everyone can deploy it privately and securely. We open-source the system and the backbone model to encourage further research on LLM-driven autopilot systems.

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

TL;DR

Cognitive Kernel addresses the need for autonomous autopilots by moving from environment-centric prompts to a model-centric policy that actively perceives state information. It introduces a triad of kernels (reasoning, perception, memory) and a two-stage training regimen on an open-source Llama3 backbone, enabling end-to-end task completion from web search to local file processing. Across three use cases—the real-time information management, private information management, and long-term memory management—the system achieves better or comparable performance to leading closed-source agents, with clear gains when using an adapted backbone. By releasing the code and weights, the work offers a practical, open benchmark for the development of generalist autopilots and emphasizes the synergy between policy design and system architecture.

Abstract

We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots. Unlike copilot systems, which primarily rely on users to provide essential state information (e.g., task descriptions) and assist users by answering questions or auto-completing contents, autopilot systems must complete tasks from start to finish independently, which requires the system to acquire the state information from the environments actively. To achieve this, an autopilot system should be capable of understanding user intents, actively gathering necessary information from various real-world sources, and making wise decisions. Cognitive Kernel adopts a model-centric design. In our implementation, the central policy model (a fine-tuned LLM) initiates interactions with the environment using a combination of atomic actions, such as opening files, clicking buttons, saving intermediate results to memory, or calling the LLM itself. This differs from the widely used environment-centric design, where a task-specific environment with predefined actions is fixed, and the policy model is limited to selecting the correct action from a given set of options. Our design facilitates seamless information flow across various sources and provides greater flexibility. We evaluate our system in three use cases: real-time information management, private information management, and long-term memory management. The results demonstrate that Cognitive Kernel achieves better or comparable performance to other closed-source systems in these scenarios. Cognitive Kernel is fully dockerized, ensuring everyone can deploy it privately and securely. We open-source the system and the backbone model to encourage further research on LLM-driven autopilot systems.
Paper Structure (32 sections, 3 equations, 11 figures, 4 tables)

This paper contains 32 sections, 3 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Comparison of conceptual frameworks: (a) Copilot system, (b) Environment-centric Agent system, and (c) Cognitive Kernel system, highlighting key structural differences. After receiving a task, Cognitive Kernel will evaluate whether it has all essential state information to make a sound action. If not, it will actively perceive the missing state information from the environment, which can be a deeper-level self-contained autopilot task.
  • Figure 2: The overall framework of the multi-granularity information management system.
  • Figure 3: The engineering framework of Cognitive Kernel.
  • Figure 4: Overall task completion results on the WebCanvas test set.
  • Figure 5: Performance comparison of various end-to-end systems (left) and open-source LLMs across different question types in the DocBench.
  • ...and 6 more figures