Table of Contents
Fetching ...

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Chenxi Wang, Zhuoyun Yu, Xin Xie, Wuguannan Yao, Runnan Fang, Shuofei Qiao, Kexin Cao, Guozhou Zheng, Xiang Qi, Peng Zhang, Shumin Deng

Abstract

Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing a \textbf{plug-and-play skill knowledge base} that can be reused across agents and environments. SkillX operates through a fully automated pipeline built on three synergistic innovations: \textit{(i) Multi-Level Skills Design}, which distills raw trajectories into three-tiered hierarchy of strategic plans, functional skills, and atomic skills; \textit{(ii) Iterative Skills Refinement}, which automatically revises skills based on execution feedback to continuously improve library quality; and \textit{(iii) Exploratory Skills Expansion}, which proactively generates and validates novel skills to expand coverage beyond seed training data. Using a strong backbone agent (GLM-4.6), we automatically build a reusable skill library and evaluate its transferability on challenging long-horizon, user-interactive benchmarks, including AppWorld, BFCL-v3, and $τ^2$-Bench. Experiments show that SkillKB consistently improves task success and execution efficiency when plugged into weaker base agents, highlighting the importance of structured, hierarchical experience representations for generalizable agent learning. Our code will be publicly available soon at https://github.com/zjunlp/SkillX.

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Abstract

Learning from experience is critical for building capable large language model (LLM) agents, yet prevailing self-evolving paradigms remain inefficient: agents learn in isolation, repeatedly rediscover similar behaviors from limited experience, resulting in redundant exploration and poor generalization. To address this problem, we propose SkillX, a fully automated framework for constructing a \textbf{plug-and-play skill knowledge base} that can be reused across agents and environments. SkillX operates through a fully automated pipeline built on three synergistic innovations: \textit{(i) Multi-Level Skills Design}, which distills raw trajectories into three-tiered hierarchy of strategic plans, functional skills, and atomic skills; \textit{(ii) Iterative Skills Refinement}, which automatically revises skills based on execution feedback to continuously improve library quality; and \textit{(iii) Exploratory Skills Expansion}, which proactively generates and validates novel skills to expand coverage beyond seed training data. Using a strong backbone agent (GLM-4.6), we automatically build a reusable skill library and evaluate its transferability on challenging long-horizon, user-interactive benchmarks, including AppWorld, BFCL-v3, and -Bench. Experiments show that SkillKB consistently improves task success and execution efficiency when plugged into weaker base agents, highlighting the importance of structured, hierarchical experience representations for generalizable agent learning. Our code will be publicly available soon at https://github.com/zjunlp/SkillX.

Paper Structure

This paper contains 63 sections, 10 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Claude Skills follow a long-context, progressively disclosed format, which requires a complex sandboxing system and multiple interactions, thereby posing challenges to robust reasoning. In contrast, SkillX adopts a hierarchical, itemized representation that can be stored and retrieved via a lightweight retrieval module and injected into the system prompt in one time, making it easier to transfer across base models.
  • Figure 2: SkillX provides an automated, iterative pipeline for constructing a skills library, integrating skills extraction. skills expansion and skills refinement. The skills library is organized into three levels: planning skills, functional skills, and atomic skills.
  • Figure 3: Comprehensive Analysis of SkillX.(a) Performance of Multi-skills: Models exhibit varying performance under different skill composition. (b) Execution efficiency of Multi-skills: Jointly composing all skills yields the best execution efficiency. (c) Iterative optimization: Iterative skill refinement further improves performance. (d) Skill expansion strategies: Experience-guided expansion achieves the best on scalability and performance gains. (e) Analysis of Input tokens: Properly balancing input tokens is crucial for controlling inference cost. (f) Analysis of Execution steps: Experience-based learning reduces the number of execution steps.
  • Figure 4: AppWorld benchmark case study: Updating Spotify playlist based on roommates' suggestions.SkillX successfully handles API call sequences (pagination pattern for playlist retrieval) and cross-app integration (integrating Spotify and Phone APIs), while the baseline without multi-level skills fails due to incorrect API call sequences and inability to complete cross-app integration tasks.
  • Figure 5: BFCL benchmark case study: Vehicle engine start safety check and Twitter posting.SkillX follows prerequisite sequences (lock doors $\rightarrow$ press brake pedal $\rightarrow$ start engine) and properly authenticates before posting tweets, while the baseline without multi-level skills fails by calling APIs without prerequisites and encountering tool calling errors.
  • ...and 1 more figures