Table of Contents
Fetching ...

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, Huaxiu Yao

TL;DR

SkillRL tackles the inefficiency of learning from raw experience in LLM agents by introducing automatic skill discovery and recursive evolution. It distills diverse trajectories into a hierarchical SkillBank of general and task-specific skills, enabling adaptive retrieval that reduces context length while preserving reasoning utility. A recursive evolution loop continuously augments the skill library based on validation failures, ensuring co-evolution with the policy. Across ALFWorld, WebShop, and seven search-augmented tasks, SkillRL achieves state-of-the-art results with substantial improvements over memory-based and prompt-based baselines, demonstrating strong generalization and faster convergence in complex environments.

Abstract

Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

TL;DR

SkillRL tackles the inefficiency of learning from raw experience in LLM agents by introducing automatic skill discovery and recursive evolution. It distills diverse trajectories into a hierarchical SkillBank of general and task-specific skills, enabling adaptive retrieval that reduces context length while preserving reasoning utility. A recursive evolution loop continuously augments the skill library based on validation failures, ensuring co-evolution with the policy. Across ALFWorld, WebShop, and seven search-augmented tasks, SkillRL achieves state-of-the-art results with substantial improvements over memory-based and prompt-based baselines, demonstrating strong generalization and faster convergence in complex environments.

Abstract

Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.
Paper Structure (23 sections, 9 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 9 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a) Overview of the SkillRL pipeline. Unlike previous methods (gray dashed lines) that store raw trajectories and discard failures, SkillRL employs an experience-based distillation mechanism to transform diverse experiences into structured skills. (b) Performance on ALFWorld validation set shridharalfworld. SkillRL achieves faster convergence and superior success rates compared to vanilla GRPO and memory-augmented RL.
  • Figure 2: Overview of the SkillRL framework. We collect trajectories using a base model, distill them into a hierarchical skill library, perform cold-start SFT to enable skill utilization, and then conduct RL training with dynamic skill evolution based on validation failures.
  • Figure 3: Evolution of skill library size during RL training. Dynamic skill evolution adds skills at validation checkpoints.
  • Figure 4: Comparison of prompt length (tokens) between raw memory retrieval and our distilled skill abstraction. SkillRL consistently reduces context overhead while maintaining reasoning utility.
  • Figure 5: Success rate on ALFWorld validation set. The recursive skill evolution significantly accelerates convergence and enhances the overall performance ceiling.
  • ...and 1 more figures