Table of Contents
Fetching ...

Universal Scalability in Declarative Program Analysis (with Choice-Based Combination Pruning)

Anastasios Antoniadis, Ilias Tsatiris, Nevill Grech, Yannis Smaragdakis

TL;DR

The paper tackles scalability in declarative program analysis by addressing the uncontrolled growth of results in Datalog-based analyses. It introduces Choice-Bound, a simple and universal technique that extensions non-deterministic choice to multiplicity bounds via a counting field and a hash-based identifier, parameterized by $N$. Applied to large frameworks such as Doop and Symbolic Value-Flow (symvalic) for Ethereum, Choice-Bound delivers substantial performance improvements—often exceeding an order of magnitude ($>10\times$)—while incurring only modest completeness loss. The approach yields a flexible design space for tuning precision vs. performance and demonstrates practical impact by enabling previously intractable analyses to run within reasonable time and resources.

Abstract

In this work, we present a simple, uniform, and elegant solution to the problem, with stunning practical effectiveness and application to virtually any Datalog-based analysis. The approach consists of leveraging the choice construct, supported natively in modern Datalog engines like Soufflé. The choice construct allows the definition of functional dependencies in a relation and has been used in the past for expressing worklist algorithms. We show a near-universal construction that allows the choice construct to flexibly limit evaluation of predicates. The technique is applicable to practically any analysis architecture imaginable, since it adaptively prunes evaluation results when a (programmer-controlled) projection of a relation exceeds a desired cardinality. We apply the technique to probably the largest, pre-existing Datalog analysis frameworks in existence: Doop (for Java bytecode) and the main client analyses from the Gigahorse framework (for Ethereum smart contracts). Without needing to understand the existing analysis logic and with minimal, local-only changes, the performance of each framework increases dramatically, by over 20x for the hardest inputs, with near-negligible sacrifice in completeness.

Universal Scalability in Declarative Program Analysis (with Choice-Based Combination Pruning)

TL;DR

The paper tackles scalability in declarative program analysis by addressing the uncontrolled growth of results in Datalog-based analyses. It introduces Choice-Bound, a simple and universal technique that extensions non-deterministic choice to multiplicity bounds via a counting field and a hash-based identifier, parameterized by . Applied to large frameworks such as Doop and Symbolic Value-Flow (symvalic) for Ethereum, Choice-Bound delivers substantial performance improvements—often exceeding an order of magnitude ()—while incurring only modest completeness loss. The approach yields a flexible design space for tuning precision vs. performance and demonstrates practical impact by enabling previously intractable analyses to run within reasonable time and resources.

Abstract

In this work, we present a simple, uniform, and elegant solution to the problem, with stunning practical effectiveness and application to virtually any Datalog-based analysis. The approach consists of leveraging the choice construct, supported natively in modern Datalog engines like Soufflé. The choice construct allows the definition of functional dependencies in a relation and has been used in the past for expressing worklist algorithms. We show a near-universal construction that allows the choice construct to flexibly limit evaluation of predicates. The technique is applicable to practically any analysis architecture imaginable, since it adaptively prunes evaluation results when a (programmer-controlled) projection of a relation exceeds a desired cardinality. We apply the technique to probably the largest, pre-existing Datalog analysis frameworks in existence: Doop (for Java bytecode) and the main client analyses from the Gigahorse framework (for Ethereum smart contracts). Without needing to understand the existing analysis logic and with minimal, local-only changes, the performance of each framework increases dramatically, by over 20x for the hardest inputs, with near-negligible sacrifice in completeness.

Paper Structure

This paper contains 28 sections, 8 figures.

Figures (8)

  • Figure 1: Simple specification of value-flow (points-to) analysis, with field sensitivity.
  • Figure 2: Execution times shown in seconds for the default 2-obj analysis and Choice-Bound.
  • Figure 3: Speedup per benchmark. The numbers shown are multiplicative factors. The three highest bars are under-estimates, since the default analysis never terminated, in 24hrs.
  • Figure 4: App methods reachability for the default 2-obj analysis and 2-obj with Choice-Bound.
  • Figure 5: App May Fail Casts and Completeness Loss.
  • ...and 3 more figures