Teralizer: Semantics-Based Test Generalization from Conventional Unit Tests to Property-Based Tests
Johann Glock, Clemens Bauer, Martin Pinzger
TL;DR
Teralizer introduces a semantics-based approach to generalize conventional unit tests into property-based tests by extracting path-exact specifications from program semantics using single-path symbolic analysis. Implemented as a Java prototype, Teralizer converts JUnit tests into jqwik tests through a five-stage pipeline, and evaluates mutation-detection gains across controlled benchmarks (EqBench, Apache Commons) and real-world RepoReapers projects. Results show 1–4 percentage-point mutation-score improvements under controlled conditions, with substantial gaps to real-world applicability due to type-support limitations in symbolic analysis and non-standard project structures. The work contributes a reproducible framework for semantics-based test generalization, a detailed analysis of applicability barriers, and a roadmap for extending type support, interprocedural analysis, and constraint encoding to broaden real-world usefulness. Overall, the study demonstrates promise in combining short-generation with generalization to improve fault-detection while highlighting pragmatic challenges in scaling to diverse real-world codebases.
Abstract
Conventional unit tests validate single input-output pairs, leaving most inputs of an execution path untested. Property-based testing addresses this shortcoming by generating multiple inputs satisfying properties but requires significant manual effort to define properties and their constraints. We propose a semantics-based approach that automatically transforms unit tests into property-based tests by extracting specifications from implementations via single-path symbolic analysis. We demonstrate this approach through Teralizer, a prototype for Java that transforms JUnit tests into property-based jqwik tests. Unlike prior work that generalizes from input-output examples, Teralizer derives specifications from program semantics. We evaluated Teralizer on three progressively challenging datasets. On EvoSuite-generated tests for EqBench and Apache Commons utilities, Teralizer improved mutation scores by 1-4 percentage points. Generalization of mature developer-written tests from Apache Commons utilities showed only 0.05-0.07 percentage points improvement. Analysis of 632 real-world Java projects from RepoReapers highlights applicability barriers: only 1.7% of projects completed the generalization pipeline, with failures primarily due to type support limitations in symbolic analysis and static analysis limitations in our prototype. Based on the results, we provide a roadmap for future work, identifying research and engineering challenges that need to be tackled to advance the field of test generalization. Artifacts available at: https://doi.org/10.5281/zenodo.17950381
