Table of Contents
Fetching ...

AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

Xuanzhong Chen, Zile Qiao, Guoxin Chen, Liangcai Su, Zhen Zhang, Xinyu Wang, Pengjun Xie, Fei Huang, Jingren Zhou, Yong Jiang

TL;DR

The paper tackles the challenge of building LLM-based agents capable of deep, cross-domain reasoning by introducing a Zone of Proximal Development (ZPD) inspired data-synthesis framework. The AgentFrontier Engine automatically generates frontier-level data through a three-stage process: Seed Question generation (Stage I), Agentic refinement with a tool suite (Stage II), and ZPD-based filtering with LKP/MKO adjudication and diversity controls (Stage III). It also introduces the ZPD Exam, a self-evolving, automated benchmark grounded in a large, up-to-date knowledge frontier to diagnose agentic reasoning and tool-use capabilities. The authors demonstrate that continual pre-training on knowledge-intensive data plus post-training on frontier trajectories yields state-of-the-art results on demanding benchmarks like Humanity's Last Exam and ZPD Exam, validating the utility of ZPD-guided curricula for scalable, capable LLM agents. Overall, the work provides a scalable pathway to advance agentic reasoning by tightly intertwining data synthesis, dynamic evaluation, and iterative training.

Abstract

Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning. We introduce a data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which defines this frontier as tasks an LLM cannot solve alone but can master with guidance. To operationalize this, we present the AgentFrontier Engine, an automated pipeline that synthesizes high-quality, multidisciplinary data situated precisely within the LLM's ZPD. This engine supports both continued pre-training with knowledge-intensive data and targeted post-training on complex reasoning tasks. From the same framework, we derive the ZPD Exam, a dynamic and automated benchmark designed to evaluate agent capabilities on these frontier tasks. We train AgentFrontier-30B-A3B model on our synthesized data, which achieves state-of-the-art results on demanding benchmarks like Humanity's Last Exam, even surpassing some leading proprietary agents. Our work demonstrates that a ZPD-guided approach to data synthesis offers a scalable and effective path toward building more capable LLM agents.

AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis

TL;DR

The paper tackles the challenge of building LLM-based agents capable of deep, cross-domain reasoning by introducing a Zone of Proximal Development (ZPD) inspired data-synthesis framework. The AgentFrontier Engine automatically generates frontier-level data through a three-stage process: Seed Question generation (Stage I), Agentic refinement with a tool suite (Stage II), and ZPD-based filtering with LKP/MKO adjudication and diversity controls (Stage III). It also introduces the ZPD Exam, a self-evolving, automated benchmark grounded in a large, up-to-date knowledge frontier to diagnose agentic reasoning and tool-use capabilities. The authors demonstrate that continual pre-training on knowledge-intensive data plus post-training on frontier trajectories yields state-of-the-art results on demanding benchmarks like Humanity's Last Exam and ZPD Exam, validating the utility of ZPD-guided curricula for scalable, capable LLM agents. Overall, the work provides a scalable pathway to advance agentic reasoning by tightly intertwining data synthesis, dynamic evaluation, and iterative training.

Abstract

Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning. We introduce a data synthesis approach inspired by the educational theory of the Zone of Proximal Development (ZPD), which defines this frontier as tasks an LLM cannot solve alone but can master with guidance. To operationalize this, we present the AgentFrontier Engine, an automated pipeline that synthesizes high-quality, multidisciplinary data situated precisely within the LLM's ZPD. This engine supports both continued pre-training with knowledge-intensive data and targeted post-training on complex reasoning tasks. From the same framework, we derive the ZPD Exam, a dynamic and automated benchmark designed to evaluate agent capabilities on these frontier tasks. We train AgentFrontier-30B-A3B model on our synthesized data, which achieves state-of-the-art results on demanding benchmarks like Humanity's Last Exam, even surpassing some leading proprietary agents. Our work demonstrates that a ZPD-guided approach to data synthesis offers a scalable and effective path toward building more capable LLM agents.

Paper Structure

This paper contains 57 sections, 2 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: Performance of LLM agents on the text-only HLE text-only set and ZPD Exam-v1.
  • Figure 2: High-quality data situated in an LLM's ZPD acts as a catalyst, transforming it from a LKP into a MKO.
  • Figure 3: The three-stage synthesis pipeline of the AgentFrontier Engine. It begins by creating multi-source seed questions, then iteratively escalates their complexity using a tool-augmented agent, and finally filters through our ZPD-based calibration mechanism to isolate high-value training data.
  • Figure 4: An overview of our iterative refinement process. We start with a biomedical seed QA, which is then refined into a complex diagnostic reasoning problem by synthesizing knowledge from academic literature. Finally, this problem is evolved into a practical computational challenge grounded in a real-world application, a process involving web search and programmatic validation.
  • Figure 5: The ZPD Exam-v1 consists of 1024 questions categorized into 9 disciplines: Mathematics, Computer Science / Artificial Intelligence, Physics, History, Humanities, Chemistry, Biology / Medicine, Engineering, and Geography.
  • ...and 3 more figures