Table of Contents
Fetching ...

Tuning Random Generators: Property-Based Testing as Probabilistic Programming

Ryan Tjoa, Poorva Garg, Harrison Goldstein, Todd Millstein, Benjamin Pierce, Guy Van den Broeck

TL;DR

This work tackles the challenge of tuning property-based testing generators by learning symbolic weights offline to optimize user-defined objectives. It introduces Loaded Dice, a discrete probabilistic programming language with exact inference via binary decision diagrams and differentiable gradients, enabling gradient-based learning of generator weights. The framework supports target distributions, diversity via entropy, and validity via specification entropy, with REINFORCE used to handle entropy objectives efficiently. Empirical results on Etna benchmarks show that automatically tuned generators can dramatically accelerate bug finding (3.1–7.4x) while increasing the diversity and validity of test cases. Practically, this approach provides a principled, automated way to shape input distributions in PBT, reducing manual tuning burdens and improving testing efficacy.

Abstract

Property-based testing validates software against an executable specification by evaluating it on randomly generated inputs. The standard way that PBT users generate test inputs is via generators that describe how to sample test inputs through random choices. To achieve a good distribution over test inputs, users must tune their generators, i.e., decide on the weights of these individual random choices. Unfortunately, it is very difficult to understand how to choose individual generator weights in order to achieve a desired distribution, so today this process is tedious and limits the distributions that can be practically achieved. In this paper, we develop techniques for the automatic and offline tuning of generators. Given a generator with undetermined symbolic weights and an objective function, our approach automatically learns values for these weights that optimize for the objective. We describe useful objective functions that allow users to (1) target desired distributions and (2) improve the diversity and validity of their test cases. We have implemented our approach in a novel discrete probabilistic programming system, Loaded Dice, that supports differentiation and parameter learning, and use it as a language for generators. We empirically demonstrate that our approach is effective at optimizing generator distributions according to the specified objective functions. We also perform a thorough evaluation on PBT benchmarks, demonstrating that, when automatically tuned for diversity and validity, the generators exhibit a 3.1-7.4x speedup in bug finding.

Tuning Random Generators: Property-Based Testing as Probabilistic Programming

TL;DR

This work tackles the challenge of tuning property-based testing generators by learning symbolic weights offline to optimize user-defined objectives. It introduces Loaded Dice, a discrete probabilistic programming language with exact inference via binary decision diagrams and differentiable gradients, enabling gradient-based learning of generator weights. The framework supports target distributions, diversity via entropy, and validity via specification entropy, with REINFORCE used to handle entropy objectives efficiently. Empirical results on Etna benchmarks show that automatically tuned generators can dramatically accelerate bug finding (3.1–7.4x) while increasing the diversity and validity of test cases. Practically, this approach provides a principled, automated way to shape input distributions in PBT, reducing manual tuning burdens and improving testing efficacy.

Abstract

Property-based testing validates software against an executable specification by evaluating it on randomly generated inputs. The standard way that PBT users generate test inputs is via generators that describe how to sample test inputs through random choices. To achieve a good distribution over test inputs, users must tune their generators, i.e., decide on the weights of these individual random choices. Unfortunately, it is very difficult to understand how to choose individual generator weights in order to achieve a desired distribution, so today this process is tedious and limits the distributions that can be practically achieved. In this paper, we develop techniques for the automatic and offline tuning of generators. Given a generator with undetermined symbolic weights and an objective function, our approach automatically learns values for these weights that optimize for the objective. We describe useful objective functions that allow users to (1) target desired distributions and (2) improve the diversity and validity of their test cases. We have implemented our approach in a novel discrete probabilistic programming system, Loaded Dice, that supports differentiation and parameter learning, and use it as a language for generators. We empirically demonstrate that our approach is effective at optimizing generator distributions according to the specified objective functions. We also perform a thorough evaluation on PBT benchmarks, demonstrating that, when automatically tuned for diversity and validity, the generators exhibit a 3.1-7.4x speedup in bug finding.

Paper Structure

This paper contains 60 sections, 1 theorem, 20 equations, 16 figures, 4 tables.

Key Result

proposition 1

The gradient of the entropy of $p_{G,\theta}$, which can be expressed as the expectation of $\log{p_{G, \theta}(x)}$, can be estimated as follows:

Figures (16)

  • Figure 1: A generator of (not necessarily valid) red-black trees, using symbolic weights. The macros @type and @match, provided by the Loaded Dice embedding in Julia, implement algebraic datatypes and pattern matching. The match expression computing w allows the weights to depend on size, as described in \ref{['sec:splitting']}.
  • Figure 2: Tuning the distributions of heights of generated values to be uniform or linear for several generators. The top row trains the generators to have a uniform distribution over heights, and the bottom row trains the generators to have a linear distribution over heights.
  • Figure 3: Tuning Generators in Loaded Dice.
  • Figure 4: Results for tuning an STLC generator for unique types using specification entropy. For brevity, the legend of (a) shows only the most common types and abbreviates "Bool" as "B."
  • Figure 5: Cumulative unique valid red-black trees throughout sampling, for our type-based RBT generator tuned for the different objective functions. We regularize weights as described in \ref{['sec:regularization']}.
  • ...and 11 more figures

Theorems & Definitions (12)

  • Definition 1: objective function
  • Example 1
  • Definition 2: push-forward of generator distribution
  • Example 2
  • Definition 3: target objective function
  • Example 3
  • Definition 4: entropy objective function
  • Definition 5: specification objective function
  • Definition 6: specification entropy objective function
  • Definition 7: feature specification entropy objective function
  • ...and 2 more