Table of Contents
Fetching ...

SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Shuaike Shen, Wenduo Cheng, Mingqian Ma, Alistair Turcan, Martin Jinye Zhang, Jian Ma

Abstract

Modern scientific ecosystems are rich in procedural knowledge across repositories, APIs, scripts, notebooks, documentation, databases, and papers, yet much of this knowledge remains fragmented across heterogeneous artifacts that agents cannot readily operationalize. This gap between abundant scientific know-how and usable agent capabilities is a key bottleneck for building effective scientific agents. We present SkillFoundry, a self-evolving framework that converts such resources into validated agent skills, reusable packages that encode task scope, inputs and outputs, execution steps, environment assumptions, provenance, and tests. SkillFoundry organizes a target domain as a domain knowledge tree, mines resources from high-value branches, extracts operational contracts, compiles them into executable skill packages, and then iteratively expands, repairs, merges, or prunes the resulting library through a closed-loop validation process. SkillFoundry produces a substantially novel and internally valid skill library, with 71.1\% of mined skills differing from existing skill libraries such as SkillHub and SkillSMP. We demonstrate that these mined skills improve coding agent performance on five of the six MoSciBench datasets. We further show that SkillFoundry can design new task-specific skills on demand for concrete scientific objectives, and that the resulting skills substantially improve performance on two challenging genomics tasks: cell type annotation and the scDRS workflow. Together, these results show that automatically mined skills improve agent performance on benchmarks and domain-specific tasks, expand coverage beyond hand-crafted skill libraries, and provide a practical foundation for more capable scientific agents.

SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

Abstract

Modern scientific ecosystems are rich in procedural knowledge across repositories, APIs, scripts, notebooks, documentation, databases, and papers, yet much of this knowledge remains fragmented across heterogeneous artifacts that agents cannot readily operationalize. This gap between abundant scientific know-how and usable agent capabilities is a key bottleneck for building effective scientific agents. We present SkillFoundry, a self-evolving framework that converts such resources into validated agent skills, reusable packages that encode task scope, inputs and outputs, execution steps, environment assumptions, provenance, and tests. SkillFoundry organizes a target domain as a domain knowledge tree, mines resources from high-value branches, extracts operational contracts, compiles them into executable skill packages, and then iteratively expands, repairs, merges, or prunes the resulting library through a closed-loop validation process. SkillFoundry produces a substantially novel and internally valid skill library, with 71.1\% of mined skills differing from existing skill libraries such as SkillHub and SkillSMP. We demonstrate that these mined skills improve coding agent performance on five of the six MoSciBench datasets. We further show that SkillFoundry can design new task-specific skills on demand for concrete scientific objectives, and that the resulting skills substantially improve performance on two challenging genomics tasks: cell type annotation and the scDRS workflow. Together, these results show that automatically mined skills improve agent performance on benchmarks and domain-specific tasks, expand coverage beyond hand-crafted skill libraries, and provide a practical foundation for more capable scientific agents.

Paper Structure

This paper contains 26 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of SkillFoundry. Starting from a domain knowledge tree, the framework mines branch-relevant resources, extracts structured candidate skills, validates them through multi-level testing, expands the tree with verified skills, and prunes redundant or low-value leaves.
  • Figure 2: Composition of the mined skill library and runtime of the mining pipeline. (a) Distribution of the 286 mined skills across domains and subdomains. (b) Average runtime of the major pipeline stages under a fixed resource-mining budget.
  • Figure 3: UMAP of cell-type annotations from ground truth, Codex (without skill), Codex+SkillFoundry, and SpatialAgent. SMCs denote smooth muscle cells. Labels are harmonized into eight major cell types, following the SpatialAgent wang2025spatialagent paper. Coverage is the fraction of cells assigned to one of the eight major cell types, and accuracy is the fraction whose predicted label matches the harmonized ground truth.
  • Figure 4: Quantitative and qualitative evaluation of Biomni on the scDRS workflow, with and without SkillFoundry skills. (a) RMSE between agent and expert outputs across three replicates. SkillFoundry reduces error overall, with two runs exactly matching the expert output. (b) Expert qualitative scores across the same replicates. A score of 7 indicates that all evaluation criteria are met. SkillFoundry produces the only run that satisfies all qualitative criteria.
  • Figure 5: Representative scDRS output generated by Biomni without SkillFoundry skills. The figure ranks cell types by association strength and marks nominal significance, but does not summarize FDR-supported signal or within-cell-type heterogeneity.
  • ...and 3 more figures