Table of Contents
Fetching ...

Verifying Peephole Rewriting In SSA Compiler IRs

Siddharth Bhat, Alex Keizer, Chris Hughes, Andrés Goens, Tobias Grosser

TL;DR

The paper tackles verifying peephole rewrites across domain-specific SSA-based IRs by introducing a core calculus that supports regions and by implementing it in Lean as LeanMLIR(X) with an MLIR syntax embedding. It provides a verified peephole rewriter and builds two canonical SSA optimizations (DCE and CSE), together with automation tactics to keep proof goals manageable. Three MLIR-based case studies (bitvectors, structured control flow, and fully homomorphic encryption) demonstrate the approach’s extensibility to diverse domains, including a QuotRing IR modeling $R = (\mathbb{Z}/q\mathbb{Z})[X]/(X^{2^n}+1)$. The work enables formally verified rewrites on new domain-specific IRs, offering a practical bridge between automation (SMT-like tools) and interactive theorem proving for compiler reasoning with regions and SSA def-use chains.

Abstract

There is an increasing need for domain-specific reasoning in modern compilers. This has fueled the use of tailored intermediate representations (IRs) based on static single assignment (SSA), like in the MLIR compiler framework. Interactive theorem provers (ITPs) provide strong guarantees for the end-to-end verification of compilers (e.g., CompCert). However, modern compilers and their IRs evolve at a rate that makes proof engineering alongside them prohibitively expensive. Nevertheless, well-scoped push-button automated verification tools such as the Alive peephole verifier for LLVM-IR gained recognition in domains where SMT solvers offer efficient (semi) decision procedures. In this paper, we aim to combine the convenience of automation with the versatility of ITPs for verifying peephole rewrites across domain-specific IRs. We formalize a core calculus for SSA-based IRs that is generic over the IR and covers so-called regions (nested scoping used by many domain-specific IRs in the MLIR ecosystem). Our mechanization in the Lean proof assistant provides a user-friendly frontend for translating MLIR syntax into our calculus. We provide scaffolding for defining and verifying peephole rewrites, offering tactics to eliminate the abstraction overhead of our SSA calculus. We prove correctness theorems about peephole rewriting, as well as two classical program transformations. To evaluate our framework, we consider three use cases from the MLIR ecosystem that cover different levels of abstractions: (1) bitvector rewrites from LLVM, (2) structured control flow, and (3) fully homomorphic encryption. We envision that our mechanization provides a foundation for formally verified rewrites on new domain-specific IRs.

Verifying Peephole Rewriting In SSA Compiler IRs

TL;DR

The paper tackles verifying peephole rewrites across domain-specific SSA-based IRs by introducing a core calculus that supports regions and by implementing it in Lean as LeanMLIR(X) with an MLIR syntax embedding. It provides a verified peephole rewriter and builds two canonical SSA optimizations (DCE and CSE), together with automation tactics to keep proof goals manageable. Three MLIR-based case studies (bitvectors, structured control flow, and fully homomorphic encryption) demonstrate the approach’s extensibility to diverse domains, including a QuotRing IR modeling . The work enables formally verified rewrites on new domain-specific IRs, offering a practical bridge between automation (SMT-like tools) and interactive theorem proving for compiler reasoning with regions and SSA def-use chains.

Abstract

There is an increasing need for domain-specific reasoning in modern compilers. This has fueled the use of tailored intermediate representations (IRs) based on static single assignment (SSA), like in the MLIR compiler framework. Interactive theorem provers (ITPs) provide strong guarantees for the end-to-end verification of compilers (e.g., CompCert). However, modern compilers and their IRs evolve at a rate that makes proof engineering alongside them prohibitively expensive. Nevertheless, well-scoped push-button automated verification tools such as the Alive peephole verifier for LLVM-IR gained recognition in domains where SMT solvers offer efficient (semi) decision procedures. In this paper, we aim to combine the convenience of automation with the versatility of ITPs for verifying peephole rewrites across domain-specific IRs. We formalize a core calculus for SSA-based IRs that is generic over the IR and covers so-called regions (nested scoping used by many domain-specific IRs in the MLIR ecosystem). Our mechanization in the Lean proof assistant provides a user-friendly frontend for translating MLIR syntax into our calculus. We provide scaffolding for defining and verifying peephole rewrites, offering tactics to eliminate the abstraction overhead of our SSA calculus. We prove correctness theorems about peephole rewriting, as well as two classical program transformations. To evaluate our framework, we consider three use cases from the MLIR ecosystem that cover different levels of abstractions: (1) bitvector rewrites from LLVM, (2) structured control flow, and (3) fully homomorphic encryption. We envision that our mechanization provides a foundation for formally verified rewrites on new domain-specific IRs.
Paper Structure (22 sections, 5 figures)

This paper contains 22 sections, 5 figures.

Figures (5)

  • Figure 1: User definitions for QuotRing, which declares the operations and types of the IR, the type signatures of the operations, and the denotations of the types and operations into Lean types.
  • Figure 2: A peephole rewrite in $\texttt{LeanMLIR}(\operatorname{QuotRing})$ asserts the semantic equivalence of two SSA programs given in MLIR syntax. Our proof automation through simp_peephole eliminates the framework overhead, such that closing a clean mathematical goal suffices to prove correctness.
  • Figure 3: Definitions in $\texttt{LeanMLIR}(X)$ for Expr and Com, and their associated denotations.
  • Figure 4: Extending $\texttt{LeanMLIR}(X)$ with regions. New fields are in green. In OpDenote, one can now access the sub-computation represented by the region when defining the semantics of Op.
  • Figure 5: Simplified implementation of $\texttt{LeanMLIR}(scf(X))$ Observe that the IR is parametrized over another IR Op', and that we add control flow to the other IR in a modular fashion.