Test Adequacy for Metamorphic Testing: Criteria, Measurement, and Implication

An Fu; Chang-ai Sun; Jiaming Zhang; Huai Liu

Test Adequacy for Metamorphic Testing: Criteria, Measurement, and Implication

An Fu, Chang-ai Sun, Jiaming Zhang, Huai Liu

TL;DR

This work addresses the gap in assessing metamorphic testing (MT) adequacy by introducing a $k$-MR coverage criterion that jointly accounts for metamorphic relations (MRs) and source inputs. It defines a formal MT adequacy measure, $C_{MT}^k$, that rewards diverse MR–source-input associations and varying test requirements, then validates the approach through large-scale empirical studies across seven programs and thousands of mutants. The results show that higher MT adequacy generally improves fault-detection effectiveness, with stronger coverage criteria (e.g., IO-CTF) achieving the best performance and revealing real-life defects. The findings offer practical guidelines for constructing effective MT test suites and provide a foundation for future work on MR/input selection and prioritization to balance effectiveness and cost.

Abstract

Metamorphic testing (MT) is a simple yet effective technique to alleviate the oracle problem in software testing. The underlying idea of MT is to test a software system by checking whether metamorphic relations (MRs) hold among multiple test inputs (including source and follow-up inputs) and the actual output of their executions. Since MRs and source inputs are two essential components of MT, considerable efforts have been made to examine the systematic identification of MRs and the effective generation of source inputs, which has greatly enriched the fundamental theory of MT since its invention. However, few studies have investigated the test adequacy assessment issue of MT, which hinders the objective measurement of MT's test quality as well as the effective construction of test suites. Although in the context of traditional software testing, there exist a number of test adequacy criteria that specify testing requirements to constitute an adequate test from various perspectives, they are not in line with MT's focus which is to test the software under testing (SUT) from the perspective of necessary properties. In this paper, we proposed a new set of criteria that specifies testing requirements from the perspective of necessary properties satisfied by the SUT, and designed a test adequacy measurement that evaluates the degree of adequacy based on both MRs and source inputs. The experimental results have shown that the proposed measurement can effectively indicate the fault detection effectiveness of test suites, i.e., test suites with increased test adequacy usually exhibit higher effectiveness in fault detection. Our work made an attempt to assess the test adequacy of MT from a new perspective, and our criteria and measurement provide a new approach to evaluate the test quality of MT and provide guidelines for constructing effective test suites of MT.

Test Adequacy for Metamorphic Testing: Criteria, Measurement, and Implication

TL;DR

This work addresses the gap in assessing metamorphic testing (MT) adequacy by introducing a

-MR coverage criterion that jointly accounts for metamorphic relations (MRs) and source inputs. It defines a formal MT adequacy measure,

, that rewards diverse MR–source-input associations and varying test requirements, then validates the approach through large-scale empirical studies across seven programs and thousands of mutants. The results show that higher MT adequacy generally improves fault-detection effectiveness, with stronger coverage criteria (e.g., IO-CTF) achieving the best performance and revealing real-life defects. The findings offer practical guidelines for constructing effective MT test suites and provide a foundation for future work on MR/input selection and prioritization to balance effectiveness and cost.

Abstract

Paper Structure (27 sections, 9 equations, 11 figures, 6 tables)

This paper contains 27 sections, 9 equations, 11 figures, 6 tables.

Introduction
Background
Metamorphic Testing
Test Adequacy Criteria
Adequacy Criteria and Measurement for MT
Motivation
$k$-MR coverage criterion
Test adequacy measurement
Empirical Studies
Research Questions
Subject Programs and Their Faulty Versions
Identification of MRs
Construction of Test Input Pools
Generation of Metamorphic Test Suites
Variables and Measures
...and 12 more sections

Figures (11)

Figure 1: Comparison between MT and conventional testing techniques. $p(t)$ denotes the actual output of executing input $t$ on $p$, $f(t)$ denotes the test oracle of $p(t)$. Figure 1(a) illustrates the test result verification mechanism of conventional testing techniques; figure 1(b) illustrates the test result verification mechanism of MT using an MR in relation two test inputs.
Figure 2: Illustration of $k$-MR coverage criterion. $R^{i}_{in}$ denotes the input subrelation of an MR and $R^{i}_{out}$ denotes the output subrelation of an MR.
Figure 3: Distribution of the fault detection effectiveness of $k$-MR coverage criterion under different values of $k$ on program PHONE
Figure 4: Distribution of the fault detection effectiveness of $k$-MR coverage criterion under different test case adequacy criteria on program print_tokens
Figure 5: Average of the fault detection rate of $k$-MR coverage criterion under different test case adequacy criteria on program grep
...and 6 more figures

Theorems & Definitions (5)

Definition 1: Association relationship between a source input and an MR
Definition 2: Test Adequacy Criterion for MT
Definition 3: MRs covered by a source input
Definition 4: $k$-MR coverage criterion, $\mathcal{C}_{\rm{MT}}(\mathcal{C}_{c})$
Definition 5: Test adequacy measurement based on $k$-MR coverage criterion

Test Adequacy for Metamorphic Testing: Criteria, Measurement, and Implication

TL;DR

Abstract

Test Adequacy for Metamorphic Testing: Criteria, Measurement, and Implication

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (5)