Table of Contents
Fetching ...

Teralizer: Semantics-Based Test Generalization from Conventional Unit Tests to Property-Based Tests

Johann Glock, Clemens Bauer, Martin Pinzger

TL;DR

Teralizer introduces a semantics-based approach to generalize conventional unit tests into property-based tests by extracting path-exact specifications from program semantics using single-path symbolic analysis. Implemented as a Java prototype, Teralizer converts JUnit tests into jqwik tests through a five-stage pipeline, and evaluates mutation-detection gains across controlled benchmarks (EqBench, Apache Commons) and real-world RepoReapers projects. Results show 1–4 percentage-point mutation-score improvements under controlled conditions, with substantial gaps to real-world applicability due to type-support limitations in symbolic analysis and non-standard project structures. The work contributes a reproducible framework for semantics-based test generalization, a detailed analysis of applicability barriers, and a roadmap for extending type support, interprocedural analysis, and constraint encoding to broaden real-world usefulness. Overall, the study demonstrates promise in combining short-generation with generalization to improve fault-detection while highlighting pragmatic challenges in scaling to diverse real-world codebases.

Abstract

Conventional unit tests validate single input-output pairs, leaving most inputs of an execution path untested. Property-based testing addresses this shortcoming by generating multiple inputs satisfying properties but requires significant manual effort to define properties and their constraints. We propose a semantics-based approach that automatically transforms unit tests into property-based tests by extracting specifications from implementations via single-path symbolic analysis. We demonstrate this approach through Teralizer, a prototype for Java that transforms JUnit tests into property-based jqwik tests. Unlike prior work that generalizes from input-output examples, Teralizer derives specifications from program semantics. We evaluated Teralizer on three progressively challenging datasets. On EvoSuite-generated tests for EqBench and Apache Commons utilities, Teralizer improved mutation scores by 1-4 percentage points. Generalization of mature developer-written tests from Apache Commons utilities showed only 0.05-0.07 percentage points improvement. Analysis of 632 real-world Java projects from RepoReapers highlights applicability barriers: only 1.7% of projects completed the generalization pipeline, with failures primarily due to type support limitations in symbolic analysis and static analysis limitations in our prototype. Based on the results, we provide a roadmap for future work, identifying research and engineering challenges that need to be tackled to advance the field of test generalization. Artifacts available at: https://doi.org/10.5281/zenodo.17950381

Teralizer: Semantics-Based Test Generalization from Conventional Unit Tests to Property-Based Tests

TL;DR

Teralizer introduces a semantics-based approach to generalize conventional unit tests into property-based tests by extracting path-exact specifications from program semantics using single-path symbolic analysis. Implemented as a Java prototype, Teralizer converts JUnit tests into jqwik tests through a five-stage pipeline, and evaluates mutation-detection gains across controlled benchmarks (EqBench, Apache Commons) and real-world RepoReapers projects. Results show 1–4 percentage-point mutation-score improvements under controlled conditions, with substantial gaps to real-world applicability due to type-support limitations in symbolic analysis and non-standard project structures. The work contributes a reproducible framework for semantics-based test generalization, a detailed analysis of applicability barriers, and a roadmap for extending type support, interprocedural analysis, and constraint encoding to broaden real-world usefulness. Overall, the study demonstrates promise in combining short-generation with generalization to improve fault-detection while highlighting pragmatic challenges in scaling to diverse real-world codebases.

Abstract

Conventional unit tests validate single input-output pairs, leaving most inputs of an execution path untested. Property-based testing addresses this shortcoming by generating multiple inputs satisfying properties but requires significant manual effort to define properties and their constraints. We propose a semantics-based approach that automatically transforms unit tests into property-based tests by extracting specifications from implementations via single-path symbolic analysis. We demonstrate this approach through Teralizer, a prototype for Java that transforms JUnit tests into property-based jqwik tests. Unlike prior work that generalizes from input-output examples, Teralizer derives specifications from program semantics. We evaluated Teralizer on three progressively challenging datasets. On EvoSuite-generated tests for EqBench and Apache Commons utilities, Teralizer improved mutation scores by 1-4 percentage points. Generalization of mature developer-written tests from Apache Commons utilities showed only 0.05-0.07 percentage points improvement. Analysis of 632 real-world Java projects from RepoReapers highlights applicability barriers: only 1.7% of projects completed the generalization pipeline, with failures primarily due to type support limitations in symbolic analysis and static analysis limitations in our prototype. Based on the results, we provide a roadmap for future work, identifying research and engineering challenges that need to be tackled to advance the field of test generalization. Artifacts available at: https://doi.org/10.5281/zenodo.17950381

Paper Structure

This paper contains 66 sections, 7 figures, 15 tables, 1 algorithm.

Figures (7)

  • Figure 1: The conventional unit test misses a regression that the property-based test detects.
  • Figure 2: Teralizer takes implementation and test code as input, and produces property-based tests as output.
  • Figure 3: Overview of Teralizer's test generalization process.
  • Figure 4: Percentage of detected mutants (left side) and improvement over INITIAL (right side) per project and generalization strategy. Improvements show both the absolute improvement (top value) as well as the relative improvement (bottom value).
  • Figure 5: Runtime comparison between original and generalized tests. The runtime differences measure how much longer generalized tests take to execute, on average, compared to corresponding original tests. We show the difference per test (left) and per try (right).
  • ...and 2 more figures