Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

Aaron Bembenek; Michael Greenberg; Stephen Chong

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

Aaron Bembenek, Michael Greenberg, Stephen Chong

TL;DR

Evaluated Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F♯, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis.

Abstract

By combining Datalog, SMT solving, and functional programming, the language Formulog provides an appealing mix of features for implementing SMT-based static analyses (e.g., refinement type checking, symbolic execution) in a natural, declarative way. At the same time, the performance of its custom Datalog solver can be an impediment to using Formulog beyond prototyping -- a common problem for Datalog variants that aspire to solve large problem instances. In this work we speed up Formulog evaluation, with surprising results: while 2.2x speedups are obtained by using the conventional techniques for high-performance Datalog (e.g., compilation, specialized data structures), the big wins come by abandoning the central assumption in modern performant Datalog engines, semi-naive Datalog evaluation. In its place, we develop eager evaluation, a concurrent Datalog evaluation algorithm that explores the logical inference space via a depth-first traversal order. In practice, eager evaluation leads to an advantageous distribution of Formulog's SMT workload to external SMT solvers and improved SMT solving times: our eager evaluation extensions to the Formulog interpreter and Soufflé's code generator achieve mean 5.2x and 7.6x speedups, respectively, over the optimized code generated by off-the-shelf Soufflé on SMT-heavy Formulog benchmarks. Using compilation and eager evaluation, Formulog implementations of refinement type checking, bottom-up pointer analysis, and symbolic execution achieve speedups on 20 out of 23 benchmarks over previously published, hand-tuned analyses written in F#, Java, and C++, providing strong evidence that Formulog can be the basis of a realistic platform for SMT-based static analysis. Moreover, our experience adds nuance to the conventional wisdom that semi-naive evaluation is the one-size-fits-all best Datalog evaluation algorithm for static analysis workloads.

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

TL;DR

Abstract

Paper Structure (58 sections, 10 theorems, 9 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 58 sections, 10 theorems, 9 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Impact Beyond Formulog
We push the frontiers of Datalog for static analysis
We highlight how Datalog variants can be sensitive to logical inference order
We question the conventional wisdom on speeding up Datalog
Contributions
Background
Datalog
Soufflé
Formulog
Compiling Formulog to Soufflé
Core Translation
Implementation Details
Representing Values
Compiling Destructors
...and 43 more sections

Key Result

theorem 1

Given an arbitrary Datalog program, eager evaluation and semi-naive evaluation derive exactly the same facts.

Figures (7)

Figure 1: A grammar distilling the standard features of Datalog; we model Soufflé by adding functor calls$@{f}^{\mathrm{cpp}}(\mathbf{t})$, which are FFI calls invoking external C++ functions.
Figure 2: A grammar for core Formulog, which extends Datalog with functional programming via algebraic data type definitions $T$, functions $F$, and expressions $e$. The functional language includes constructors for building terms representing SMT formulas and functions for testing their satisfiability and extracting models.
Figure 3: Rules for translating from Formulog to Soufflé. Negative predicates $!p(\mathbf{e})$ and inequality predicates $e \mathrel{\mathsf{\mathord{!}\mathord{=}}} e$ can be translated analogously to their positive counterparts. We use the notation $\texttt{++}$ for list concatenation.
Figure 4: Formulog compiled to off-the-shelf Soufflé achieves a 2.2$\times$ mean speedup over the baseline Formulog interpreter, while also using 38$\times$ less memory in the mean. Left is better; we omit compilation time (static analyses are compiled once and executed many times).
Figure 5: A Formulog program computes reachability over a tree labeled with SMT propositions $\phi_{(i,j)}$ by using the built-in constructor for conjunction () and the built-in function to check satisfiability. Backticks demarcate SMT formulas.
...and 2 more figures

Theorems & Definitions (10)

theorem 1: Correctness
lemma 1: Soundness
lemma 2: Completeness
lemma 3
lemma 4
lemma 5
lemma 6: Completeness
lemma 7: Soundness
corollary 1
theorem 2: Correctness

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

TL;DR

Abstract

Making Formulog Fast: An Argument for Unconventional Datalog Evaluation (Extended Version)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (10)