Tuning Random Generators: Property-Based Testing as Probabilistic Programming
Ryan Tjoa, Poorva Garg, Harrison Goldstein, Todd Millstein, Benjamin Pierce, Guy Van den Broeck
TL;DR
This work tackles the challenge of tuning property-based testing generators by learning symbolic weights offline to optimize user-defined objectives. It introduces Loaded Dice, a discrete probabilistic programming language with exact inference via binary decision diagrams and differentiable gradients, enabling gradient-based learning of generator weights. The framework supports target distributions, diversity via entropy, and validity via specification entropy, with REINFORCE used to handle entropy objectives efficiently. Empirical results on Etna benchmarks show that automatically tuned generators can dramatically accelerate bug finding (3.1–7.4x) while increasing the diversity and validity of test cases. Practically, this approach provides a principled, automated way to shape input distributions in PBT, reducing manual tuning burdens and improving testing efficacy.
Abstract
Property-based testing validates software against an executable specification by evaluating it on randomly generated inputs. The standard way that PBT users generate test inputs is via generators that describe how to sample test inputs through random choices. To achieve a good distribution over test inputs, users must tune their generators, i.e., decide on the weights of these individual random choices. Unfortunately, it is very difficult to understand how to choose individual generator weights in order to achieve a desired distribution, so today this process is tedious and limits the distributions that can be practically achieved. In this paper, we develop techniques for the automatic and offline tuning of generators. Given a generator with undetermined symbolic weights and an objective function, our approach automatically learns values for these weights that optimize for the objective. We describe useful objective functions that allow users to (1) target desired distributions and (2) improve the diversity and validity of their test cases. We have implemented our approach in a novel discrete probabilistic programming system, Loaded Dice, that supports differentiation and parameter learning, and use it as a language for generators. We empirically demonstrate that our approach is effective at optimizing generator distributions according to the specified objective functions. We also perform a thorough evaluation on PBT benchmarks, demonstrating that, when automatically tuned for diversity and validity, the generators exhibit a 3.1-7.4x speedup in bug finding.
