Table of Contents
Fetching ...

CF-GKAT: Efficient Validation of Control-Flow Transformations

Cheng Zhang, Tobias Kappé, David E. Narváez, Nico Naus

TL;DR

CF-GKAT extends Guarded Kleene Algebra with Tests to address non-local control flow and embedded primitive semantics by introducing indicator variables and a continuation-based semantics. It constructs CF-GKAT automata, lowers them to GKAT automata, and decides trace equivalence via GKAT bisimilarity, achieving near-linear performance in program size for fixed test variables. The approach enables reliable validation of control-flow transformations such as goto-elimination and decompiler outputs, demonstrated through a GNU Coreutils case study and a Ghidra-based decompilation pipeline. This work broadens the applicability of Kleene-algebra-based verification to real-world control-flow challenges, offering a practical verification tool for decompilation and control-flow restructuring tasks.

Abstract

Guarded Kleene Algebra with Tests (GKAT) provides a sound and complete framework to reason about trace equivalence between simple imperative programs. However, there are still several notable limitations. First, GKAT is completely agnostic with respect to the meaning of primitives, to keep equivalence decidable. Second, GKAT excludes non-local control flow such as goto, break, and return. To overcome these limitations, we introduce Control-Flow GKAT (CF-GKAT), a system that allows reasoning about programs that include non-local control flow as well as hardcoded values. CF-GKAT is able to soundly and completely verify trace equivalence of a larger class of programs, while preserving the nearly-linear efficiency of GKAT. This makes CF-GKAT suitable for the verification of control-flow manipulating procedures, such as decompilation and goto-elimination. To demonstrate CF-GKAT's abilities, we validated the output of several highly non-trivial program transformations, such as Erosa and Hendren's goto-elimination procedure and the output of Ghidra decompiler. CF-GKAT opens up the application of Kleene Algebra to a wider set of challenges, and provides an important verification tool that can be applied to the field of decompilation and control-flow transformation.

CF-GKAT: Efficient Validation of Control-Flow Transformations

TL;DR

CF-GKAT extends Guarded Kleene Algebra with Tests to address non-local control flow and embedded primitive semantics by introducing indicator variables and a continuation-based semantics. It constructs CF-GKAT automata, lowers them to GKAT automata, and decides trace equivalence via GKAT bisimilarity, achieving near-linear performance in program size for fixed test variables. The approach enables reliable validation of control-flow transformations such as goto-elimination and decompiler outputs, demonstrated through a GNU Coreutils case study and a Ghidra-based decompilation pipeline. This work broadens the applicability of Kleene-algebra-based verification to real-world control-flow challenges, offering a practical verification tool for decompilation and control-flow restructuring tasks.

Abstract

Guarded Kleene Algebra with Tests (GKAT) provides a sound and complete framework to reason about trace equivalence between simple imperative programs. However, there are still several notable limitations. First, GKAT is completely agnostic with respect to the meaning of primitives, to keep equivalence decidable. Second, GKAT excludes non-local control flow such as goto, break, and return. To overcome these limitations, we introduce Control-Flow GKAT (CF-GKAT), a system that allows reasoning about programs that include non-local control flow as well as hardcoded values. CF-GKAT is able to soundly and completely verify trace equivalence of a larger class of programs, while preserving the nearly-linear efficiency of GKAT. This makes CF-GKAT suitable for the verification of control-flow manipulating procedures, such as decompilation and goto-elimination. To demonstrate CF-GKAT's abilities, we validated the output of several highly non-trivial program transformations, such as Erosa and Hendren's goto-elimination procedure and the output of Ghidra decompiler. CF-GKAT opens up the application of Kleene Algebra to a wider set of challenges, and provides an important verification tool that can be applied to the field of decompilation and control-flow transformation.

Paper Structure

This paper contains 26 sections, 3 theorems, 38 equations, 3 figures.

Key Result

theorem 1

Given two finite GKAT automata $A_{0}$ and $A_{1}$, it is decidable whether they represent the same guarded language, i.e., whether $\lBrack A_{0} \rBrack = \lBrack A_{1} \rBrack$. The algorithm to do this has a complexity that is nearly-linear$\mathcal{O}(n \cdot \hat{ \alpha }(n))$, where $\

Figures (3)

  • Figure 1: Different versions of mp_factor_using_pollard_rho in factor.c, part of GNU Coreutils.
  • Figure 2: Different versions of mp_factor_using_pollard_rho in factor.c, part of GNU Coreutils.
  • Figure 3: Plot of the number of blinded functions per cyclomatic complexity number (CCN). The maximum CCN found was 44, yet the majority of the blinded functions have low CCN.

Theorems & Definitions (34)

  • definition 1
  • definition 2
  • definition 3
  • definition 4
  • definition 5
  • definition 6: Continuation semantics, base
  • Remark
  • definition 7: Continuation semantics, branching
  • definition 8: Continuation semantics, sequencing and loops
  • definition 9: Continuation semantics starting from a label, base
  • ...and 24 more