Test Amplification for REST APIs Using "Out-of-the-box" Large Language Models
Tolgahan Bardakci, Serge Demeyer, Mutlu Beyazit
TL;DR
The paper addresses the challenge of strengthening REST API tests by leveraging out-of-the-box large language models (ChatGPT 3.5/4 and Copilot) to amplify an existing test suite, using PetStore as a representative API. It compares prompts and models to assess coverage, readability, and post-processing effort, finding that using an OpenAPI specification and pushing for many amplified tests (Prompt 3) yields the strongest coverage and bug exposure, especially with GPT-4 and Copilot. The study demonstrates that LLM-driven amplification can produce readable, actionable tests with manageable post-processing, and it provides guidelines on prompt design. The work has practical implications for API quality assurance, suggesting concrete prompts and workflows to integrate LLM-generated tests into CI/CD pipelines and pull requests.
Abstract
REST APIs (Representational State Transfer Application Programming Interfaces) are an indispensable building block in today's cloud-native applications, so testing them is critically important. However, writing automated tests for such REST APIs is challenging because one needs strong and readable tests that exercise the boundary values of the protocol embedded in the REST API. In this paper, we report our experience with using "out of the box" large language models (ChatGPT and GitHub's Copilot) to amplify REST API test suites. We compare the resulting tests based on coverage and understandability, and we derive a series of guidelines and lessons learned concerning the prompts that result in the strongest test suite.
