Data-Driven Template-Free Invariant Generation
Yuan Xia, Jyotirmoy V. Deshmukh, Mukund Raghothaman, Srivatsan Ravi
TL;DR
The paper tackles state explosion in verifying concurrent programs by learning inductive invariants from data rather than relying on templates. It introduces InvGen, a data-driven, template-free invariant synthesis approach that views invariants as classifiers learned from positive (observed) and speculative negative states, employing decision-tree learning and syntax-guided synthesis to produce compact predicates. A runtime monitoring loop continuously tests the current invariant against new executions, triggering refinements when refutations occur and providing probabilistic guarantees via statistical model checking with parameter $\epsilon$. For finite-state distributed programs, the method is shown to converge in probability to a tight over-approximation of the reachable state set, and experiments on Promela/Spin demonstrate effectiveness with substantially fewer explored states than exhaustive methods.
Abstract
Automatic verification of concurrent programs faces state explosion due to the exponential possible interleavings of its sequential components coupled with large or infinite state spaces. An alternative is deductive verification, where given a candidate invariant, we establish inductive invariance and show that any state satisfying the invariant is also safe. However, learning (inductive) program invariants is difficult. To this end, we propose a data-driven procedure to synthesize program invariants, where it is assumed that the program invariant is an expression that characterizes a (hopefully tight) over-approximation of the reachable program states. The main ideas of our approach are: (1) We treat a candidate invariant as a classifier separating states observed in (sampled) program traces from those speculated to be unreachable. (2) We develop an enumerative, template-free approach to learn such classifiers from positive and negative examples. At its core, our enumerative approach employs decision trees to generate expressions that do not over-fit to the observed states (and thus generalize). (3) We employ a runtime framework to monitor program executions that may refute the candidate invariant; every refutation triggers a revision of the candidate invariant. Our runtime framework can be viewed as an instance of statistical model checking, which gives us probabilistic guarantees on the candidate invariant. We also show that such in some cases, our counterexample-guided inductive synthesis approach converges (in probability) to an overapproximation of the reachable set of states. Our experimental results show that our framework excels in learning useful invariants using only a fraction of the set of reachable states for a wide variety of concurrent programs.
