Table of Contents
Fetching ...

LLM Assisted Coding with Metamorphic Specification Mutation Agent

Mostafijur Rahman Akhond, Gias Uddin

TL;DR

CodeMetaAgent (CMA) introduces metamorphic relation–driven reasoning as an integrated agentic framework for LLM-assisted software engineering. By coordinating Mutator, Reviewer, Generator, and Evaluator modules, CMA transforms task descriptions and test inputs into semantically aligned variations, improving code generation accuracy and test coverage. Empirical results on HumanEval-Pro, MBPP-Pro, and SWE-Bench-Lite across diverse LLMs show improvements up to $+17$ percentage points in code accuracy and near-saturation test coverage (up to $99.81 ext{ extendash}99.90 ext{ extendash}$) with MR augmentation, along with a case study demonstrating enhanced bug-fix efficacy. The work highlights metamorphic relations as a principled, practical approach to increasing robustness, reliability, and coverage in LLM-driven software development, and outlines recommendations for sustainable deployment and future extensions such as adaptive MR selection and multi-agent collaboration.

Abstract

Metamorphic Relations (MRs) serve as a foundational mechanism for generating semantically equivalent mutations. Software engineering has advanced significantly in recent years with the advent of Large Language Models (LLMs). However, the reliability of LLMs in software engineering is often compromised by ambiguities and inconsistencies due to improper user specification. To address this challenge, we present CodeMetaAgent (CMA), a metamorphic relation-driven LLM agent that systematically refines task specifications and generates semantically constrained test cases. Our proposed framework uses MRs with LLMs to improve generation consistency and reduce variability caused by specifications, unlike the traditional use of MRs as post validations. Our framework has been evaluated on the HumanEval-Pro, MBPP-Pro, and SWE-Bench_Lite datasets using the GPT-4o, Mistral Large, GPT-OSS, and Qwen3-Coder models. It improved code generation accuracy by up to 17% and achieved code coverage gains of up to 99.81%. These results show that metamorphic relations can be a simple but effective guide in assisting LLM-based software development.

LLM Assisted Coding with Metamorphic Specification Mutation Agent

TL;DR

CodeMetaAgent (CMA) introduces metamorphic relation–driven reasoning as an integrated agentic framework for LLM-assisted software engineering. By coordinating Mutator, Reviewer, Generator, and Evaluator modules, CMA transforms task descriptions and test inputs into semantically aligned variations, improving code generation accuracy and test coverage. Empirical results on HumanEval-Pro, MBPP-Pro, and SWE-Bench-Lite across diverse LLMs show improvements up to percentage points in code accuracy and near-saturation test coverage (up to ) with MR augmentation, along with a case study demonstrating enhanced bug-fix efficacy. The work highlights metamorphic relations as a principled, practical approach to increasing robustness, reliability, and coverage in LLM-driven software development, and outlines recommendations for sustainable deployment and future extensions such as adaptive MR selection and multi-agent collaboration.

Abstract

Metamorphic Relations (MRs) serve as a foundational mechanism for generating semantically equivalent mutations. Software engineering has advanced significantly in recent years with the advent of Large Language Models (LLMs). However, the reliability of LLMs in software engineering is often compromised by ambiguities and inconsistencies due to improper user specification. To address this challenge, we present CodeMetaAgent (CMA), a metamorphic relation-driven LLM agent that systematically refines task specifications and generates semantically constrained test cases. Our proposed framework uses MRs with LLMs to improve generation consistency and reduce variability caused by specifications, unlike the traditional use of MRs as post validations. Our framework has been evaluated on the HumanEval-Pro, MBPP-Pro, and SWE-Bench_Lite datasets using the GPT-4o, Mistral Large, GPT-OSS, and Qwen3-Coder models. It improved code generation accuracy by up to 17% and achieved code coverage gains of up to 99.81%. These results show that metamorphic relations can be a simple but effective guide in assisting LLM-based software development.

Paper Structure

This paper contains 37 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Workflow of the CMA framework.
  • Figure 2: Prompt used to generate metamorphically transformed problem description
  • Figure 3: Prompt used to generate metamorphically transformed testcases
  • Figure 4: Example of the code generation process using LLM A) Following the CMA pipeline, and B) Without following the CMA mutation process
  • Figure 5: Example of the testcase generation process using LLM A) Following the CMA pipeline, and B) Without following the CMA mutation process
  • ...and 6 more figures