Table of Contents
Fetching ...

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Yueqi Song, Ketan Ramaneti, Zaid Sheikh, Ziru Chen, Boyu Gou, Tianbao Xie, Yiheng Xu, Danyang Zhang, Apurva Gandhi, Fan Yang, Joseph Liu, Tianyue Ou, Zhihao Yuan, Frank Xu, Shuyan Zhou, Xingyao Wang, Xiang Yue, Tao Yu, Huan Sun, Yu Su, Graham Neubig

TL;DR

The paper addresses the fragmentation of agent training data by introducing the Agent Data Protocol (ADP), a lightweight interlingua that standardizes heterogeneous datasets into Trajectory objects with Actions and Observations. By converting 13 diverse datasets into ADP, it creates a large, unified dataset (ADP Dataset V1) totaling about 1.3M trajectories and demonstrates training-ready data across multiple agent frameworks. ADP-finetuned models achieve roughly 20% improvements and reach state-of-the-art or near-state-of-the-art performance on coding, browsing, tool use, and research benchmarks without domain-specific tuning, while enabling strong cross-task transfer and easier adaptation to new harnesses. The work emphasizes practical impacts, including linear rather than quadratic data-conversion costs and open-source release of code and data to spur scalable, reproducible agent training.

Abstract

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. To this end, we introduce the agent data protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets in diverse formats and unified agent training pipelines downstream. The design of ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic workflows, while remaining simple to parse and train on without engineering at a per-dataset level. In experiments, we unified a broad collection of 13 existing agent training datasets into ADP format, and converted the standardized ADP data into training-ready formats for multiple agent frameworks. We performed SFT on these data, and demonstrated an average performance gain of ~20% over corresponding base models, and delivers state-of-the-art or near-SOTA performance on standard coding, browsing, tool use, and research benchmarks, without domain-specific tuning. All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

TL;DR

The paper addresses the fragmentation of agent training data by introducing the Agent Data Protocol (ADP), a lightweight interlingua that standardizes heterogeneous datasets into Trajectory objects with Actions and Observations. By converting 13 diverse datasets into ADP, it creates a large, unified dataset (ADP Dataset V1) totaling about 1.3M trajectories and demonstrates training-ready data across multiple agent frameworks. ADP-finetuned models achieve roughly 20% improvements and reach state-of-the-art or near-state-of-the-art performance on coding, browsing, tool use, and research benchmarks without domain-specific tuning, while enabling strong cross-task transfer and easier adaptation to new harnesses. The work emphasizes practical impacts, including linear rather than quadratic data-conversion costs and open-source release of code and data to spur scalable, reproducible agent training.

Abstract

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. To this end, we introduce the agent data protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets in diverse formats and unified agent training pipelines downstream. The design of ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic workflows, while remaining simple to parse and train on without engineering at a per-dataset level. In experiments, we unified a broad collection of 13 existing agent training datasets into ADP format, and converted the standardized ADP data into training-ready formats for multiple agent frameworks. We performed SFT on these data, and demonstrated an average performance gain of ~20% over corresponding base models, and delivers state-of-the-art or near-SOTA performance on standard coding, browsing, tool use, and research benchmarks, without domain-specific tuning. All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.

Paper Structure

This paper contains 21 sections, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Overview of the Agent Data Protocol (ADP). Raw data from diverse sources such as AgentInstruct, CodeActInstruct, SWE-Gym, and Mind2Web are converted into a standardized ADP format. ADP unifies data into Trajectory objects, which include two core components: Actions (API action, code action, message action) and Observations (text observation, web observation). This standardized representation enables seamless integration with various agent SFT pipelines. Example transformations demonstrate how heterogeneous raw data is normalized for training agentic models.
  • Figure 2: ADP collapses many-to-many conversions into a hub-and-spoke pipeline.Left: Without ADP, each of $D$-many datasets needs a custom Raw$\rightarrow$SFT converter for each of $A$-many agentic formats (quadratic $O(D\times A)$ effort), causing duplicated code and efforts. Right: With ADP, each dataset is converted once (Raw$\rightarrow$ADP) and each agent only requires one converter (ADP$\rightarrow$SFT), yielding linear $O(D{+}A)$ effort. New datasets or agents plug in immediately to the rest of ADP.
  • Figure 3: Performance Scaling Across Agents and Benchmarks (Base vs ADP Trained)
  • Figure 4: Performance Gains Across Agents and Benchmarks.