Table of Contents
Fetching ...

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Tianqing Fang, Zhisong Zhang, Xiaoyang Wang, Rui Wang, Can Qin, Yuxuan Wan, Jun-Yu Ma, Ce Zhang, Jiaqi Chen, Xiyun Li, Hongming Zhang, Haitao Mi, Dong Yu

TL;DR

Cognitive Kernel-Pro introduces a fully open-source, multi-module framework for deep research agents that emphasizes modularity, data-centric training, and inference-time optimization. It combines a main coordinating agent with specialized sub-agents to handle web navigation, file processing, and tool use, all powered by a shared agent foundation model and Python-based actions. The authors present a comprehensive data-collection and augmentation recipe (including multi-hop web data, exploration-based synthesis, and Persona Hub prompts) and demonstrate state-of-the-art results on GAIA among open-source, free-tool agents, with an 8B CK-Pro model outperforming prior open-source rivals. The work also showcases techniques like reflection and voting to improve robustness, and provides a public codebase to foster reproducibility and further research in open agent systems.

Abstract

General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence, enabling complex reasoning, web interaction, coding, and autonomous research capabilities. However, current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools, limiting accessibility and reproducibility for the research community. In this work, we present \textbf{Cognitive Kernel-Pro}, a fully open-source and (to the maximum extent) free multi-module agent framework designed to democratize the development and evaluation of advanced AI agents. Within Cognitive Kernel-Pro, we systematically investigate the curation of high-quality training data for Agent Foundation Models, focusing on the construction of queries, trajectories, and verifiable answers across four key domains: web, file, code, and general reasoning. Furthermore, we explore novel strategies for agent test-time reflection and voting to enhance agent robustness and performance. We evaluate Cognitive Kernel-Pro on GAIA, achieving state-of-the-art results among open-source and free agents. Notably, our 8B-parameter open-source model surpasses previous leading systems such as WebDancer and WebSailor, establishing a new performance standard for accessible, high-capability AI agents. Code is available at https://github.com/Tencent/CognitiveKernel-Pro

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

TL;DR

Cognitive Kernel-Pro introduces a fully open-source, multi-module framework for deep research agents that emphasizes modularity, data-centric training, and inference-time optimization. It combines a main coordinating agent with specialized sub-agents to handle web navigation, file processing, and tool use, all powered by a shared agent foundation model and Python-based actions. The authors present a comprehensive data-collection and augmentation recipe (including multi-hop web data, exploration-based synthesis, and Persona Hub prompts) and demonstrate state-of-the-art results on GAIA among open-source, free-tool agents, with an 8B CK-Pro model outperforming prior open-source rivals. The work also showcases techniques like reflection and voting to improve robustness, and provides a public codebase to foster reproducibility and further research in open agent systems.

Abstract

General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence, enabling complex reasoning, web interaction, coding, and autonomous research capabilities. However, current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools, limiting accessibility and reproducibility for the research community. In this work, we present \textbf{Cognitive Kernel-Pro}, a fully open-source and (to the maximum extent) free multi-module agent framework designed to democratize the development and evaluation of advanced AI agents. Within Cognitive Kernel-Pro, we systematically investigate the curation of high-quality training data for Agent Foundation Models, focusing on the construction of queries, trajectories, and verifiable answers across four key domains: web, file, code, and general reasoning. Furthermore, we explore novel strategies for agent test-time reflection and voting to enhance agent robustness and performance. We evaluate Cognitive Kernel-Pro on GAIA, achieving state-of-the-art results among open-source and free agents. Notably, our 8B-parameter open-source model surpasses previous leading systems such as WebDancer and WebSailor, establishing a new performance standard for accessible, high-capability AI agents. Code is available at https://github.com/Tencent/CognitiveKernel-Pro

Paper Structure

This paper contains 40 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: (a) Performance comparison on the full GAIA development set (number of examples $n$=165). The left panel presents results from our open-source Cognitive Kernel-Pro framework, utilizing our Qwen3-8B SFT model and Claude-3.7 as foundation models with exclusively free tools. The right panel displays Pass@1 scores for proprietary agents and open-source systems employing paid tools. (b) Performance on the text-only GAIA subset ($n$=103), demonstrating our 8B model's superiority over 7B models in the WebDancer/WebSailor family ($\sim$2% higher Pass@1, over 10% higher Pass@3).
  • Figure 2: Technical roadmap showcasing prior innovations from Tencent AI Lab (Cognitive Kernel; cognitive_kernel, WebVoyager; webvoyager, etc) and their integration to Cognitive Kernel-Pro via three core components, agent framework development, agent data construction, and agent foundation model training. Yellow blocks highlight novel contributions in this work and the corresponding section numbers.
  • Figure 3: Overview of the Cognitive Kernel-Pro Agent Framework. The left panel illustrates the functionality of agent class, where the main agent, web agent, and file agent inherit from the common base class. The planner maintains a state dictionary containing 'completed_list', 'todo_list', 'experience', and 'information' (§ \ref{['sec:ck_arch']}). The action generator produces Python code as a code agent or invokes predefined functions of sub-agents, such as the web agent, as well as other built-in tools. The right panel illustrates the hierarchical structure of Cognitive Kernel-Pro, listing all functions defined by each agent. Additionally, a standalone reflection module is included to assess task completion; if the task is incomplete, the agent will retry (§ \ref{['sec:reflection']}). The agent foundation model behind each module/sub-agent is the same.
  • Figure 4: Illustration of information aggregation in the creation of URLQA.