Table of Contents
Fetching ...

HyperPUT: Generating Synthetic Faulty Programs to Challenge Bug-Finding Tools

Riccardo Felici, Laura Pozzi, Carlo A. Furia

TL;DR

The proposed technique, called HyperPUT, builds C programs from a “seed” bug by incrementally applying program transformations (introducing programming constructs such as conditionals, loops, etc.) until a program of the desired size is generated.

Abstract

As research in automatically detecting bugs grows and produces new techniques, having suitable collections of programs with known bugs becomes crucial to reliably and meaningfully compare the effectiveness of these techniques. Most of the existing approaches rely on benchmarks collecting manually curated real-world bugs, or synthetic bugs seeded into real-world programs. Using real-world programs entails that extending the existing benchmarks or creating new ones remains a complex time-consuming task. In this paper, we propose a complementary approach that automatically generates programs with seeded bugs. Our technique, called HyperPUT, builds C programs from a "seed" bug by incrementally applying program transformations (introducing programming constructs such as conditionals, loops, etc.) until a program of the desired size is generated. In our experimental evaluation, we demonstrate how HyperPUT can generate buggy programs that can challenge in different ways the capabilities of modern bug-finding tools, and some of whose characteristics are comparable to those of bugs in existing benchmarks. These results suggest that HyperPUT can be a useful tool to support further research in bug-finding techniques -- in particular their empirical evaluation.

HyperPUT: Generating Synthetic Faulty Programs to Challenge Bug-Finding Tools

TL;DR

The proposed technique, called HyperPUT, builds C programs from a “seed” bug by incrementally applying program transformations (introducing programming constructs such as conditionals, loops, etc.) until a program of the desired size is generated.

Abstract

As research in automatically detecting bugs grows and produces new techniques, having suitable collections of programs with known bugs becomes crucial to reliably and meaningfully compare the effectiveness of these techniques. Most of the existing approaches rely on benchmarks collecting manually curated real-world bugs, or synthetic bugs seeded into real-world programs. Using real-world programs entails that extending the existing benchmarks or creating new ones remains a complex time-consuming task. In this paper, we propose a complementary approach that automatically generates programs with seeded bugs. Our technique, called HyperPUT, builds C programs from a "seed" bug by incrementally applying program transformations (introducing programming constructs such as conditionals, loops, etc.) until a program of the desired size is generated. In our experimental evaluation, we demonstrate how HyperPUT can generate buggy programs that can challenge in different ways the capabilities of modern bug-finding tools, and some of whose characteristics are comparable to those of bugs in existing benchmarks. These results suggest that HyperPUT can be a useful tool to support further research in bug-finding techniques -- in particular their empirical evaluation.
Paper Structure (43 sections, 4 equations, 5 figures, 7 tables)

This paper contains 43 sections, 4 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Specification of a PUT that combines transformations $\textsf{SC}$ and $\textsf{IC}$ as in \ref{['eq:sc-ic-example']}.
  • Figure 2: Distribution of size (in number of transformations) of the PUTs used in the experimental evaluation.
  • Figure 3: Distributions of cyclomatic complexity per function in three collections of buggy programs: the PUTs in batch $B$ generated by HyperPUT, and benchmarks CGC cgc and LAVA-1 lava.
  • Figure 4: Distributions of the length of the execution path on a bug-triggering input in two collections of buggy programs: the PUTs in batch $B$ generated by HyperPUT, and benchmark LAVA-1 lava.
  • Figure 5: Running time to discover the bug in each PUT in batches $B_{\textsf{IC}}, B_{\textsf{SC}}, B_{\textsf{FL}}, B_{\textsf{PC}}, B_{\textsf{CC}}, B_{\star}$. The horizontal axis enumerates the 10 PUTs in each batch in order of size (number of transformations). The vertical axis measures the running time (in seconds) until the tool terminates or times out (as in all other experiments, we report the average of 4 repeated runs). A colored filled disc indicates that the tool terminated successfully (it discovered the bug); a grayed out circle indicates that the tool terminated or timed out without discovering the bug. Data about AFL are in color blue, about CBMC are in color black, about KLEE are in color yellow.