LRASGen: LLM-based RESTful API Specification Generation
Sida Deng, Rubing Huang, Man Zhang, Chenhui Cui, Dave Towey, Rongcun Wang
TL;DR
LRASGen tackles the challenge of generating accurate OpenAPI specifications directly from RESTful API source code using Large Language Models. By decomposing the workflow into endpoint detection, code extraction, parameter/constraint analysis, and OpenAPI generation, LRASGen achieves high precision/recall across Java, Python, and C# APIs, significantly outperforming existing tools and uncovering many behaviors missing from developer-provided specs. The approach leverages GPT-4o mini and DeepSeek V3 in a cross-language setup and introduces an enhanced GT* ground truth to evaluate performance. Results demonstrate that LRASGen can produce complete, standards-compliant specifications with strong coverage of parameter constraints and responses, enabling more reliable API testing and integration. The work highlights practical implications for automated API documentation and points to future improvements in token management, prompt design, and broader language/framework support.
Abstract
REpresentation State Transfer (REST) is an architectural style for designing web applications that enable scalable, stateless communication between clients and servers via common HTTP techniques. Web APIs that employ the REST style are known as RESTful (or REST) APIs. When using or testing a RESTful API, developers may need to employ its specification, which is often defined by open-source standards such as the OpenAPI Specification (OAS). However, it can be very time-consuming and error-prone to write and update these specifications, which may negatively impact the use of RESTful APIs, especially when the software requirements change. Many tools and methods have been proposed to solve this problem, such as Respector and Swagger Core. OAS generation can be regarded as a common text-generation task that creates a formal description of API endpoints derived from the source code. A potential solution for this may involve using Large Language Models (LLMs), which have strong capabilities in both code understanding and text generation. Motivated by this, we propose a novel approach for generating the OASs of RESTful APIs using LLMs: LLM-based RESTful API-Specification Generation (LRASGen). To the best of our knowledge, this is the first use of LLMs and API source code to generate OASs for RESTful APIs. Compared with existing tools and methods, LRASGen can generate the OASs, even when the implementation is incomplete (with partial code, and/or missing annotations/comments, etc.). To evaluate the LRASGen performance, we conducted a series of empirical studies on 20 real-world RESTful APIs. The results show that two LLMs (GPT-4o mini and DeepSeek V3) can both support LARSGen to generate accurate specifications, and LRASGen-generated specifications cover an average of 48.85% more missed entities than the developer-provided specifications.
