LLMorpheus: Mutation Testing using Large Language Models

Frank Tip; Jonathan Bell; Max Schaefer

LLMorpheus: Mutation Testing using Large Language Models

Frank Tip, Jonathan Bell, Max Schaefer

TL;DR

LLMorpheus introduces PLACEHOLDER-driven mutation testing by prompting LLMs to generate buggy replacements at designated code locations, extending beyond fixed operators. The system couples a prompt generator, a mutant extractor, and a customized StrykerJS to evaluate mutants on 13 JavaScript/TypeScript projects, comparing multiple open and proprietary LLMs. It demonstrates that LLM-generated mutants can resemble real-world bugs not producible by traditional mutators, with most surviving mutants representing behavioral changes, and a sizable minority being equivalent to the original code. The work analyzes prompting strategies, LLM differences, and cost, showing practical feasibility and suggesting avenues for pruning equivalent mutants via static analysis for scalable adoption.

Abstract

In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a "+" with a "-", or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique for mutation testing where placeholders are introduced at designated locations in a program's source code and where a Large Language Model (LLM) is prompted to ask what they could be replaced with. The technique is implemented in LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by LLMorpheus, demonstrating its practicality.

LLMorpheus: Mutation Testing using Large Language Models

TL;DR

Abstract

Paper Structure (58 sections, 15 figures, 159 tables)

This paper contains 58 sections, 15 figures, 159 tables.

Introduction
Background and Motivation
Example 1.
Example 2.
Example 3.
Example 4.
Discussion
Approach
Prompt generator.
Mutant generator.
Custom version of StrykerJS.
Pragmatics
Evaluation
Research Questions
Experimental Setup
...and 43 more sections

Figures (15)

Figure 1: (a) Fix for a bug reported in issue #36 in . (b) A mutation suggested by LLMorpheus at the same line that involves replacing read-access with write-access. (c) A mutation suggested by LLMorpheus elsewhere in the same file that mirrors the change made by the developer.
Figure 2: (a) Fix for a bug reported in issue #27 in . (b) A mutation suggested by LLMorpheus at the same location that similarly involves calling a different function.
Figure 3: A mutation suggested by LLMorpheus that involves associating an event listener with the event instead of with the event.
Figure 4: Overview of approach.
Figure 5: Illustration of the insertion of placeholders to direct the LLM at source locations that need to be mutated.
...and 10 more figures

LLMorpheus: Mutation Testing using Large Language Models

TL;DR

Abstract

LLMorpheus: Mutation Testing using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (15)