LLMorpheus: Mutation Testing using Large Language Models
Frank Tip, Jonathan Bell, Max Schaefer
TL;DR
LLMorpheus introduces PLACEHOLDER-driven mutation testing by prompting LLMs to generate buggy replacements at designated code locations, extending beyond fixed operators. The system couples a prompt generator, a mutant extractor, and a customized StrykerJS to evaluate mutants on 13 JavaScript/TypeScript projects, comparing multiple open and proprietary LLMs. It demonstrates that LLM-generated mutants can resemble real-world bugs not producible by traditional mutators, with most surviving mutants representing behavioral changes, and a sizable minority being equivalent to the original code. The work analyzes prompting strategies, LLM differences, and cost, showing practical feasibility and suggesting avenues for pruning equivalent mutants via static analysis for scalable adoption.
Abstract
In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a "+" with a "-", or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique for mutation testing where placeholders are introduced at designated locations in a program's source code and where a Large Language Model (LLM) is prompted to ask what they could be replaced with. The technique is implemented in LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by LLMorpheus, demonstrating its practicality.
