Table of Contents
Fetching ...

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

Aaron Bembenek, Michael Greenberg, Stephen Chong

TL;DR

Evaluated Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F♯, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis.

Abstract

By combining Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping -- a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with surprising results: while 2.2x speedups are obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In its place, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog's SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé's code generator achieve mean 5.2x and 7.6x speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F#, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

TL;DR

Evaluated Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F♯, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis.

Abstract

By combining Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping -- a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with surprising results: while 2.2x speedups are obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In its place, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog's SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé's code generator achieve mean 5.2x and 7.6x speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F#, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.
Paper Structure (58 sections, 10 theorems, 9 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 58 sections, 10 theorems, 9 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

theorem 1

Given an arbitrary Datalog program, eager evaluation and semi-naive evaluation derive exactly the same facts.

Figures (7)

  • Figure 1: A grammar distilling the standard features of Datalog; we model Soufflé by adding functor calls$@{f}^{\mathrm{cpp}}(\mathbf{t})$, which are FFI calls invoking external C++ functions.
  • Figure 2: A grammar for core Formulog, which extends Datalog with functional programming via algebraic data type definitions $T$, functions $F$, and expressions $e$. The functional language includes constructors for building terms representing SMT formulas and functions for testing their satisfiability and extracting models.
  • Figure 3: Rules for translating from Formulog to Soufflé. Negative predicates $!p(\mathbf{e})$ and inequality predicates $e \mathrel{\mathsf{\mathord{!}\mathord{=}}} e$ can be translated analogously to their positive counterparts. We use the notation $\texttt{++}$ for list concatenation.
  • Figure 4: Formulog compiled to off-the-shelf Soufflé achieves a 2.2$\times$ mean speedup over the baseline Formulog interpreter, while also using 38$\times$ less memory in the mean. Left is better; we omit compilation time (static analyses are compiled once and executed many times).
  • Figure 5: A Formulog program computes reachability over a tree labeled with SMT propositions $\phi_{(i,j)}$ by using the built-in constructor for conjunction () and the built-in function to check satisfiability. Backticks demarcate SMT formulas.
  • ...and 2 more figures

Theorems & Definitions (10)

  • theorem 1: Correctness
  • lemma 1: Soundness
  • lemma 2: Completeness
  • lemma 3
  • lemma 4
  • lemma 5
  • lemma 6: Completeness
  • lemma 7: Soundness
  • corollary 1
  • theorem 2: Correctness