AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

Jiahao Qiu; Xinzhe Juan; Yimin Wang; Ling Yang; Xuan Qi; Tongcheng Zhang; Jiacheng Guo; Yifu Lu; Zixin Yao; Hongru Wang; Shilong Liu; Xun Jiang; Liu Leqi; Mengdi Wang

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

Jiahao Qiu, Xinzhe Juan, Yimin Wang, Ling Yang, Xuan Qi, Tongcheng Zhang, Jiacheng Guo, Yifu Lu, Zixin Yao, Hongru Wang, Shilong Liu, Xun Jiang, Liu Leqi, Mengdi Wang

TL;DR

AgentDistill introduces a training-free agent distillation pipeline that transfers task-solving capabilities from large teacher agents to small student agents through distilled Model–Context–Protocols (MCPs). By extracting MCPs from successful trajectories and consolidating them into a reusable MCP-Box, students can perform tool-based reasoning at inference without fine-tuning or trajectory replay. Across biomedical and mathematical benchmarks, MCP-equipped students approach or match teacher performance and outperform retrieval-based baselines, demonstrating strong generalization with low overhead. The approach decouples task semantics from implementation, enabling scalable, domain-agnostic tool usage and efficient deployment of lightweight agents in novel environments.

Abstract

While knowledge distillation has become a mature field for compressing large language models (LLMs) into smaller ones by aligning their outputs or internal representations, the distillation of LLM-based agents, which involve planning, memory, and tool use, remains relatively underexplored. Existing agent distillation methods typically replay full teacher trajectories or imitate step-by-step teacher tool usage, but they often struggle to train student agents to dynamically plan and act in novel environments. We propose AgentDistill, a novel, training-free agent distillation framework that enables efficient and scalable knowledge transfer via direct reuse of Model-Context-Protocols (MCPs), which are structured and reusable task-solving modules autonomously generated by teacher agents. The reuse of these distilled MCPs enables student agents to generalize their capabilities across domains and solve new problems with minimal supervision or human intervention. Experiments on biomedical and mathematical benchmarks demonstrate that our distilled student agents, built on small language models, can achieve performance comparable to advanced systems using large LLMs such as OctoTools (GPT-4o), highlighting the effectiveness of our framework in building scalable and cost-efficient intelligent agents.

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

TL;DR

Abstract

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)