Table of Contents
Fetching ...

ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations

Xinyi Ni, Qiuyang Wang, Yukun Zhang, Pengyu Hong

TL;DR

ToolFactory provides an end-to-end, open-source pipeline to automatically generate AI-usable tools from diverse REST API documents, addressing the lack of standardization in scientific APIs. It combines APILLAMA for structured information extraction, a JSON-to-tool generation process, rigorous tool validation, and a knowledge-base–driven parameter value inference mechanism. The API Extraction Benchmark and the glycomaterials case study demonstrate that APILLAMA achieves strong structural accuracy while enabling practical tool creation, cross-database integration, and domain-agnostic applicability. This work lowers the development and learning costs for integrating scientific REST APIs into AI workflows, enabling domain-specific agents to operate with reduced manual engineering effort.

Abstract

LLM-based tool agents offer natural language interfaces, enabling users to seamlessly interact with computing services. While REST APIs are valuable resources for building such agents, they must first be transformed into AI-compatible tools. Automatically generating AI-compatible tools from REST API documents can greatly streamline tool agent development and minimize user learning curves. However, API documentation often suffers from a lack of standardization, inconsistent schemas, and incomplete information. To address these issues, we developed \textbf{ToolFactory}, an open-source pipeline for automating tool generation from unstructured API documents. To enhance the reliability of the developed tools, we implemented an evaluation method to diagnose errors. Furthermore, we built a knowledge base of verified tools, which we leveraged to infer missing information from poorly documented APIs. We developed the API Extraction Benchmark, comprising 167 API documents and 744 endpoints in various formats, and designed a JSON schema to annotate them. This annotated dataset was utilized to train and validate ToolFactory. The experimental results highlight the effectiveness of ToolFactory. We also demonstrated ToolFactory by creating a domain-specific AI agent for glycomaterials research. ToolFactory exhibits significant potential for facilitating the seamless integration of scientific REST APIs into AI workflows.

ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations

TL;DR

ToolFactory provides an end-to-end, open-source pipeline to automatically generate AI-usable tools from diverse REST API documents, addressing the lack of standardization in scientific APIs. It combines APILLAMA for structured information extraction, a JSON-to-tool generation process, rigorous tool validation, and a knowledge-base–driven parameter value inference mechanism. The API Extraction Benchmark and the glycomaterials case study demonstrate that APILLAMA achieves strong structural accuracy while enabling practical tool creation, cross-database integration, and domain-agnostic applicability. This work lowers the development and learning costs for integrating scientific REST APIs into AI workflows, enabling domain-specific agents to operate with reduced manual engineering effort.

Abstract

LLM-based tool agents offer natural language interfaces, enabling users to seamlessly interact with computing services. While REST APIs are valuable resources for building such agents, they must first be transformed into AI-compatible tools. Automatically generating AI-compatible tools from REST API documents can greatly streamline tool agent development and minimize user learning curves. However, API documentation often suffers from a lack of standardization, inconsistent schemas, and incomplete information. To address these issues, we developed \textbf{ToolFactory}, an open-source pipeline for automating tool generation from unstructured API documents. To enhance the reliability of the developed tools, we implemented an evaluation method to diagnose errors. Furthermore, we built a knowledge base of verified tools, which we leveraged to infer missing information from poorly documented APIs. We developed the API Extraction Benchmark, comprising 167 API documents and 744 endpoints in various formats, and designed a JSON schema to annotate them. This annotated dataset was utilized to train and validate ToolFactory. The experimental results highlight the effectiveness of ToolFactory. We also demonstrated ToolFactory by creating a domain-specific AI agent for glycomaterials research. ToolFactory exhibits significant potential for facilitating the seamless integration of scientific REST APIs into AI workflows.

Paper Structure

This paper contains 30 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The API Extraction Benchmark includes API documents with varying levels of structures. For example, the API document example shown in the left follows a standardized schema. The one in the right is less structured as several fields are described in free-text. Our dataset prioritizes API variety and emphasizes the less structured cases. The diverse document structures in our dataset necessitate a general tool generation pipeline capable of processing various document formats effectively.
  • Figure 2: TooFactory: Automated pipeline for generating AI-usable tools from API documentations. The API documentations in free-text are processed by APILlama to extract structured information, which is used to build tools that interact with the corresponding APIs.
  • Figure 3: A parameter database is constructed using validated tools, enabling parameter value inference based on the semantic similarity of parameter keys and descriptions.
  • Figure 4: AI Agent for Glycomaterial Research with Automated Tool Generation By automating tool generation, the AI agent simplifies database access and supports glycan-related tasks such as searching, drawing, and format conversion. ToolFactory generated 92 validated AI-usable tools for various tasks across several databases. A web demo is developed using the OpenAgents framework.
  • Figure B1: Example of API documentation of each category. (left) is an example of organized API documentation https://documentation.image-charts.com/?utm_source=apislist.com
  • ...and 1 more figures