Table of Contents
Fetching ...

Finding Cross-rule Optimization Bugs in Datalog Engines

Chi Zhang, Linzhang Wang, Manuel Rigger

TL;DR

The paper tackles the challenge of detecting cross-rule optimization bugs in Datalog engines, where multiple rules can be optimized together in ways that yield incorrect results. It introduces Incremental Rule Evaluation (IRE), a black-box approach that constructs a reference, unoptimized program by evaluating rules individually and incrementally while comparing its results to those of the engine under test running the optimized program. Implemented as the Deopt tool, the method was evaluated on four mature engines (Soufflé, CozoDB, μZ, and DDlog), discovering 30 bugs (including 13 logic bugs) and showing that Deopt can detect all bugs found by queryFuzz, with several additional bugs beyond queryFuzz’s reach. The evaluation also demonstrates efficiency gains in test-case generation relative to naive methods, with substantial improvements in the number of valid, non-empty test cases as test-case size grows. Overall, the proposed Incremental Rule Evaluation framework provides a simple, general, and effective pathway to uncover optimization-related bugs in Datalog engines, with practical impact for developers and researchers alike.

Abstract

Datalog is a popular and widely-used declarative logic programming language. Datalog engines apply many cross-rule optimizations; bugs in them can cause incorrect results. To detect such optimization bugs, we propose an automated testing approach called Incremental Rule Evaluation (IRE), which synergistically tackles the test oracle and test case generation problem. The core idea behind the test oracle is to compare the results of an optimized program and a program without cross-rule optimization; any difference indicates a bug in the Datalog engine. Our core insight is that, for an optimized, incrementally-generated Datalog program, we can evaluate all rules individually by constructing a reference program to disable the optimizations that are performed among multiple rules. Incrementally generating test cases not only allows us to apply the test oracle for every new rule generated-we also can ensure that every newly added rule generates a non-empty result with a given probability and eschew recomputing already-known facts. We implemented IRE as a tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely Soufflé, CozoDB, $μ$Z, and DDlog, and discovered a total of 30 bugs. Of these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the bugs identified by Deopt, queryFuzz might be unable to detect 5. Our incremental test case generation approach is efficient; for example, for test cases containing 60 rules, our incremental approach can produce 1.17$\times$ (for DDlog) to 31.02$\times$ (for Soufflé) as many valid test cases with non-empty results as the naive random method. We believe that the simplicity and the generality of the approach will lead to its wide adoption in practice.

Finding Cross-rule Optimization Bugs in Datalog Engines

TL;DR

The paper tackles the challenge of detecting cross-rule optimization bugs in Datalog engines, where multiple rules can be optimized together in ways that yield incorrect results. It introduces Incremental Rule Evaluation (IRE), a black-box approach that constructs a reference, unoptimized program by evaluating rules individually and incrementally while comparing its results to those of the engine under test running the optimized program. Implemented as the Deopt tool, the method was evaluated on four mature engines (Soufflé, CozoDB, μZ, and DDlog), discovering 30 bugs (including 13 logic bugs) and showing that Deopt can detect all bugs found by queryFuzz, with several additional bugs beyond queryFuzz’s reach. The evaluation also demonstrates efficiency gains in test-case generation relative to naive methods, with substantial improvements in the number of valid, non-empty test cases as test-case size grows. Overall, the proposed Incremental Rule Evaluation framework provides a simple, general, and effective pathway to uncover optimization-related bugs in Datalog engines, with practical impact for developers and researchers alike.

Abstract

Datalog is a popular and widely-used declarative logic programming language. Datalog engines apply many cross-rule optimizations; bugs in them can cause incorrect results. To detect such optimization bugs, we propose an automated testing approach called Incremental Rule Evaluation (IRE), which synergistically tackles the test oracle and test case generation problem. The core idea behind the test oracle is to compare the results of an optimized program and a program without cross-rule optimization; any difference indicates a bug in the Datalog engine. Our core insight is that, for an optimized, incrementally-generated Datalog program, we can evaluate all rules individually by constructing a reference program to disable the optimizations that are performed among multiple rules. Incrementally generating test cases not only allows us to apply the test oracle for every new rule generated-we also can ensure that every newly added rule generates a non-empty result with a given probability and eschew recomputing already-known facts. We implemented IRE as a tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely Soufflé, CozoDB, Z, and DDlog, and discovered a total of 30 bugs. Of these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the bugs identified by Deopt, queryFuzz might be unable to detect 5. Our incremental test case generation approach is efficient; for example, for test cases containing 60 rules, our incremental approach can produce 1.17 (for DDlog) to 31.02 (for Soufflé) as many valid test cases with non-empty results as the naive random method. We believe that the simplicity and the generality of the approach will lead to its wide adoption in practice.
Paper Structure (61 sections, 1 equation, 13 figures, 2 tables, 1 algorithm)

This paper contains 61 sections, 1 equation, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: A Datalog program used in points-to analysis.
  • Figure 2: Illustrative example, triggering a bug in CozoDB.
  • Figure 3: Overview of IRE. The initial step is not included in the iterations, as it is required only once. In iteration $n$, $r_n$ denotes the newly generated rule, $G_n$ denotes the precedence graph, $P^{ref}_{n}$ and $P^{opt}_{n}$ denote the reference and optimized programs respectively. The symbol + indicates that $r_n$ is appended to the optimized program from the previous iteration.
  • Figure 4: A test case in iteration $3$, where care is required to evaluate the rules in the correct order. The results displayed in the comments already reach the fixpoint before iteration $3$. The graph at right is the precedence graph of this test case.
  • Figure 5: A bug-inducing test case for µZ. We have omitted declarations in this test case for simplicity. This bug could be triggered regardless of whether , , and have facts.
  • ...and 8 more figures