Table of Contents
Fetching ...

LLM-assisted Mutation for Whitebox API Testing

Jia Li, Jiacheng Shen, Yuxin Su, Michael R. Lyu

TL;DR

This work addresses fitness plateaus in white-box API testing by introducing MioHint, an LLM-assisted mutation framework that uses statement-level data-dependency analysis to retrieve minimal global code context. By integrating LLM-guided mutations with EvoMaster’s SBST loop, MioHint significantly boosts code coverage (average 4.95% gain) and mutation accuracy (67x) across 16 real-world REST services, while covering over half of hard-to-cover targets. A key contribution is the statement-level def-use-based context extraction and a structured, six-part LLM prompt that supports in-context learning and JSON-formatted outputs. The approach demonstrates practical improvements for cloud API reliability and provides open-source tooling to facilitate replication and adoption in repository-scale testing scenarios.

Abstract

Cloud applications heavily rely on APIs to communicate with each other and exchange data. To ensure the reliability of cloud applications, cloud providers widely adopt API testing techniques. Unfortunately, existing API testing approaches are insufficient to reach strict conditions, a problem known as fitness plateaus, due to the lack of gradient provided by coverage metrics. To address this issue, we propose MioHint, a novel white-box API testing approach that leverages the code comprehension capabilities of Large Language Model (LLM) to boost API testing. The key challenge of LLM-based API testing lies in system-level testing, which emphasizes the dependencies between requests and targets across functions and files, thereby making the entire codebase the object of analysis. However, feeding the entire codebase to an LLM is impractical due to its limited context length and short memory. MioHint addresses this challenge by synergizing static analysis with LLMs. We retrieve relevant code with data-dependency analysis at the statement level, including def-use analysis for variables used in the target and function expansion for subfunctions called by the target. To evaluate the effectiveness of our method, we conducted experiments across 16 real-world REST API services. The findings reveal that MioHint achieves an average increase of 4.95% absolute in line coverage compared to the baseline, EvoMaster, alongside a remarkable factor of 67x improvement in mutation accuracy. Furthermore, our method successfully covers over 57% of hard-to-cover targets while in baseline the coverage is less than 10%.

LLM-assisted Mutation for Whitebox API Testing

TL;DR

This work addresses fitness plateaus in white-box API testing by introducing MioHint, an LLM-assisted mutation framework that uses statement-level data-dependency analysis to retrieve minimal global code context. By integrating LLM-guided mutations with EvoMaster’s SBST loop, MioHint significantly boosts code coverage (average 4.95% gain) and mutation accuracy (67x) across 16 real-world REST services, while covering over half of hard-to-cover targets. A key contribution is the statement-level def-use-based context extraction and a structured, six-part LLM prompt that supports in-context learning and JSON-formatted outputs. The approach demonstrates practical improvements for cloud API reliability and provides open-source tooling to facilitate replication and adoption in repository-scale testing scenarios.

Abstract

Cloud applications heavily rely on APIs to communicate with each other and exchange data. To ensure the reliability of cloud applications, cloud providers widely adopt API testing techniques. Unfortunately, existing API testing approaches are insufficient to reach strict conditions, a problem known as fitness plateaus, due to the lack of gradient provided by coverage metrics. To address this issue, we propose MioHint, a novel white-box API testing approach that leverages the code comprehension capabilities of Large Language Model (LLM) to boost API testing. The key challenge of LLM-based API testing lies in system-level testing, which emphasizes the dependencies between requests and targets across functions and files, thereby making the entire codebase the object of analysis. However, feeding the entire codebase to an LLM is impractical due to its limited context length and short memory. MioHint addresses this challenge by synergizing static analysis with LLMs. We retrieve relevant code with data-dependency analysis at the statement level, including def-use analysis for variables used in the target and function expansion for subfunctions called by the target. To evaluate the effectiveness of our method, we conducted experiments across 16 real-world REST API services. The findings reveal that MioHint achieves an average increase of 4.95% absolute in line coverage compared to the baseline, EvoMaster, alongside a remarkable factor of 67x improvement in mutation accuracy. Furthermore, our method successfully covers over 57% of hard-to-cover targets while in baseline the coverage is less than 10%.

Paper Structure

This paper contains 20 sections, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Web Service and API.
  • Figure 2: High-level view of Search-based API Testing.
  • Figure 3: Example run of MioHint on a program under test. When search-based testing encounters fitness plateaus where the mutation is inefficient (1), MioHint queries GPT-4o for a mutation hint (2), and mutates this test case according to the hint (3).
  • Figure 4: Overview of MioHint’s framework for API test generation.
  • Figure 5: Targets in MioHint
  • ...and 2 more figures