Table of Contents
Fetching ...

ToolRosetta: Bridging Open-Source Repositories and Large Language Model Agents through Automated Tool Standardization

Shimin Di, Xujie Yuan, Hanghui Guo, Chaoqian Ouyang, Zhangze Chen, Ling Yue, Libin Zheng, Jia Zhu, Shaowu Pan, Jian Yin, Min-Ling Zhang, Yong Rui

TL;DR

This paper proposes ToolRosetta, a unified framework that automatically translates open-source code repositories and APIs into MCP-compatible tools that can be reliably invoked by LLMs and incorporates a security inspection layer to mitigate risks inherent in executing arbitrary code.

Abstract

Reusing and invoking existing code remains costly and unreliable, as most practical tools are embedded in heterogeneous code repositories and lack standardized, executable interfaces. Although large language models (LLMs) and Model Context Protocol (MCP)-based tool invocation frameworks enable natural language task execution, current approaches rely heavily on manual tool curation and standardization, which fundamentally limits scalability. In this paper, we propose ToolRosetta, a unified framework that automatically translates open-source code repositories and APIs into MCP-compatible tools that can be reliably invoked by LLMs. Given a user task, ToolRosetta autonomously plans toolchains, identifies relevant codebases, and converts them into executable MCP services, enabling end-to-end task completion with minimal human intervention. In addition, ToolRosetta incorporates a security inspection layer to mitigate risks inherent in executing arbitrary code. Extensive experiments across diverse scientific domains demonstrate that ToolRosetta can automatically standardize a large number of open-source tools and reduce the human effort required for code reproduction and deployment. Notably, by seamlessly leveraging specialized open-source tools, ToolRosetta-powered agents consistently improve task completion performance compared to commercial LLMs and existing agent systems.

ToolRosetta: Bridging Open-Source Repositories and Large Language Model Agents through Automated Tool Standardization

TL;DR

This paper proposes ToolRosetta, a unified framework that automatically translates open-source code repositories and APIs into MCP-compatible tools that can be reliably invoked by LLMs and incorporates a security inspection layer to mitigate risks inherent in executing arbitrary code.

Abstract

Reusing and invoking existing code remains costly and unreliable, as most practical tools are embedded in heterogeneous code repositories and lack standardized, executable interfaces. Although large language models (LLMs) and Model Context Protocol (MCP)-based tool invocation frameworks enable natural language task execution, current approaches rely heavily on manual tool curation and standardization, which fundamentally limits scalability. In this paper, we propose ToolRosetta, a unified framework that automatically translates open-source code repositories and APIs into MCP-compatible tools that can be reliably invoked by LLMs. Given a user task, ToolRosetta autonomously plans toolchains, identifies relevant codebases, and converts them into executable MCP services, enabling end-to-end task completion with minimal human intervention. In addition, ToolRosetta incorporates a security inspection layer to mitigate risks inherent in executing arbitrary code. Extensive experiments across diverse scientific domains demonstrate that ToolRosetta can automatically standardize a large number of open-source tools and reduce the human effort required for code reproduction and deployment. Notably, by seamlessly leveraging specialized open-source tools, ToolRosetta-powered agents consistently improve task completion performance compared to commercial LLMs and existing agent systems.
Paper Structure (21 sections, 2 equations, 7 figures)

This paper contains 21 sections, 2 equations, 7 figures.

Figures (7)

  • Figure 1: Overview of ToolRosetta: a) Tool Rosetta's strength lies in how to automatically encapsulate existing open-source libraries into the MCP platform, enabling various specialized problems to be easily solved by calling tools. This approach differs from previous fixed tool and skill libraries; it is an automated, scalable, and unmanned method. b) The pipeline of transforming code into MCP services.
  • Figure 2: Overview of the ToolRosetta ecosystem. Each node represents a GitHub repository automatically converted into standardized MCP tool services, organized into five major scientific areas (Physical Sciences, Earth & Environmental Sciences, Biological Sciences, Health Sciences, and Scientific Community & Society) and Computer Science. Node groupings reflect sub-domain categorization within each area.
  • Figure 3: Automated tool conversion performance evaluation. a, Radial bar chart of average completion time per subdiscipline for ToolRosetta, human experts, and the GPT-4o service-only baseline. b, repository-level conversion success rates across 35 subdisciplines within six domains. Here GPT-4o denotes the service-only baseline that generates only MCP_service.py, the harder end-to-end full-file baseline is discussed in the text only. c, Joint visualization of success rate versus completion time, where each point represents one subdiscipline. d, Repair-focused ablation showing cumulative improvement from successive rounds of the Review-Revise-Fix (RRF) mechanism across six domains. e, Dominant failure types and repairability statistics.
  • Figure 4: Downstream task evaluation on the RosettaEval benchmark. a, Task completion accuracy of ToolRosetta and four baseline systems (SciToolAgent, ChemCrow, RepoMaster, OpenAgents) across 35 subdisciplines spanning six domains. Stars denote 21 out-of-distribution (OOD) subdomains that are not covered by curated baseline tool sets. b, Average task success rate per scientific domain for all five systems. c, Performance gain when integrating ToolRosetta-converted tools into two existing agent frameworks (OpenAgents and RepoMaster), with percentage annotations indicating absolute improvement per domain.
  • Figure 5: Workflow and results of stroke analysis using ToolRosetta.
  • ...and 2 more figures