Table of Contents
Fetching ...

Scaling Agents via Continual Pre-training

Liangcai Su, Zhen Zhang, Guangyu Li, Zhuo Chen, Chenxi Wang, Maojia Song, Xinyu Wang, Kuan Li, Jialong Wu, Xuanzhong Chen, Zile Qiao, Zhongwang Zhang, Huifeng Yin, Shihao Cai, Runnan Fang, Zhengwei Tao, Wenbiao Yin, Chenxiong Qian, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou

TL;DR

AgentFounder introduces Agentic Continual Pre-training (Agentic CPT) as an intermediate, scalable stage between pre-training and post-training to seed agentic capabilities in foundation models. Through First-order Action Synthesis and High-order Action Synthesis, plus a two-stage CPT with long-context exposure, the approach yields a pre-aligned agentic base that can be fine-tuned for downstream tasks. AgentFounder-30B achieves state-of-the-art results across 10 benchmarks, including strong tool-use and robust generalization, while exhibiting favorable scaling with model size and data volume. This work demonstrates that integrating agentic training into the pre-training phase can significantly improve learning efficiency and performance for deep research agents, offering a practical path toward more capable and adaptable autonomous systems.

Abstract

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.

Scaling Agents via Continual Pre-training

TL;DR

AgentFounder introduces Agentic Continual Pre-training (Agentic CPT) as an intermediate, scalable stage between pre-training and post-training to seed agentic capabilities in foundation models. Through First-order Action Synthesis and High-order Action Synthesis, plus a two-stage CPT with long-context exposure, the approach yields a pre-aligned agentic base that can be fine-tuned for downstream tasks. AgentFounder-30B achieves state-of-the-art results across 10 benchmarks, including strong tool-use and robust generalization, while exhibiting favorable scaling with model size and data volume. This work demonstrates that integrating agentic training into the pre-training phase can significantly improve learning efficiency and performance for deep research agents, offering a practical path toward more capable and adaptable autonomous systems.

Abstract

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.

Paper Structure

This paper contains 39 sections, 1 equation, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Performance comparison between AgentFounder and state-of-the-art deep research agents.
  • Figure 2: Agentic Training Pipeline.
  • Figure 3: Multi-Style Question-Answer Generation Based on Scalable Information Sources.
  • Figure 4: Planning Action Synthesis.
  • Figure 5: Comparison of high-orider action synthesis data and the original trajectory.
  • ...and 8 more figures