Table of Contents
Fetching ...

Etna: An Evaluation Platform for Property-Based Testing

Alperen Keles, Jessica Shi, Nikhil Kamath, Tin Nam Liu, Ceren Mert, Harrison Goldstein, Benjamin C. Pierce, Leonidas Lampropoulos

Abstract

Property-based testing is a mainstay of functional programming, boasting a rich literature, an enthusiastic user community, and an abundance of tools~ -- so many, indeed, that new users may have difficulty choosing. Moreover, any given framework may support a variety of strategies for generating test inputs; even experienced users may wonder which are better in any given situation. Sadly, the PBT literature, though long on creativity, is short on rigorous comparisons to help answer such questions. We present ETNA, a platform for empirical evaluation and comparison of PBT techniques. ETNA incorporates a number of popular PBT frameworks and testing workloads from the literature, and its extensible architecture makes adding new ones easy, while handling the technical drudgery of performance measurement. To illustrate its benefits, we use ETNA to carry out several experiments with popular PBT approaches in Rocq, Haskell, OCaml, Racket, and Rust, allowing users to more clearly understand best practices and tradeoffs.

Etna: An Evaluation Platform for Property-Based Testing

Abstract

Property-based testing is a mainstay of functional programming, boasting a rich literature, an enthusiastic user community, and an abundance of tools~ -- so many, indeed, that new users may have difficulty choosing. Moreover, any given framework may support a variety of strategies for generating test inputs; even experienced users may wonder which are better in any given situation. Sadly, the PBT literature, though long on creativity, is short on rigorous comparisons to help answer such questions. We present ETNA, a platform for empirical evaluation and comparison of PBT techniques. ETNA incorporates a number of popular PBT frameworks and testing workloads from the literature, and its extensible architecture makes adding new ones easy, while handling the technical drudgery of performance measurement. To illustrate its benefits, we use ETNA to carry out several experiments with popular PBT approaches in Rocq, Haskell, OCaml, Racket, and Rust, allowing users to more clearly understand best practices and tradeoffs.

Paper Structure

This paper contains 24 sections, 10 figures.

Figures (10)

  • Figure 1: Effectiveness of Haskell generation strategies on four workloads. = Naive QuickCheck, = Naive LeanCheck, = Naive SmallCheck, = Bespoke QuickCheck.
  • Figure 2: Number of generated inputs (averaged over 100 trials) to solve each BST task, as input size increases from three to 30 nodes.
  • Figure 3: = Naive LeanCheck (default order), = Naive LeanCheck (reverse order), = Naive SmallCheck (default order), = Naive SmallCheck (reverse order).
  • Figure 4: Effectiveness of Rocq generation strategies on four workloads. = Type-based generator, = Type-based fuzzer, = Specification-based generator ((a) - (c) only), = Variational fuzzer ((d) only), = Bespoke generator.
  • Figure 5: IFC Tasks solved within the timeout in one or more trials. Empty = Type-based generator. = Type-based fuzzer, = Variational fuzzer, = Bespoke generator.
  • ...and 5 more figures