Ownership in low-level intermediate representation

Siddharth Priya; Arie Gurfinkel

Ownership in low-level intermediate representation

Siddharth Priya, Arie Gurfinkel

TL;DR

This work extends ownership semantics to low-level intermediate representations by introducing OSEA-IR, a fat-pointer cache mechanism, and prophecy-based modeling to handle mutable borrows, enabling cache-based verification instead of full memory-model reasoning. It provides a practical path toward low-level verification by adding multiword memory support, unique pointers, and cache-control operators, while preserving a formal borrow-stack discipline. The VC generation leverages a sym-based translation with prophecy variables to ensure cache equivalence between pointer caches and memory, and is implemented in SeaBMC with a macro-based C API that translates ownership-annotated code to OSEA-IR for SMT solving. Empirically, the approach yields speedups of $1.3x-5x$ on mature open-source C code, with the magnitude of improvement depending on the extent to which memory accesses can be cached, demonstrating practical benefits for verifying low-level memory manipulation programs.

Abstract

The concept of ownership in high level languages can aid both the programmer and the compiler to reason about the validity of memory operations. Previously, ownership semantics has been used successfully in high level automatic program verification to model a reference to data by a first order logic (FOL) representation of data instead of maintaining an address map. However, ownership semantics is not used in low level program verification. We have identified two challenges. First, ownership information is lost when a program is compiled to a low level intermediate representation (e.g., in LLVM IR). Second, pointers in low level programs point to bytes using an address map (e.g., in unsafe Rust) and thus the verification condition (VC) cannot always replace a pointer by its FOL abstraction. To remedy the situation, we develop ownership semantics for an LLVM like low level intermediate representation. Using these semantics, the VC can opportunistically model some memory accesses by a direct access of a pointer cache that stores byte representation of data. This scheme reduces instances where an address map must be maintained, especially for mostly safe programs that follow ownership semantics. For unsafe functionality, memory accesses are modelled by operations on an address map and we provide mechanisms to keep the address map and pointer cache in sync. We implement these semantics in SEABMC, a bit precise bounded model checker for LLVM. For evaluation, the source programs are assumed to be written in C. Since C does not have ownership built in, suitable macros are added that introduce and preserve ownership during translation to LLVM like IR for verification. This approach is evaluated on mature open source C code. For both handcrafted benchmarks and practical programs, we observe a speedup of $1.3x-5x$ during SMT solving.

Ownership in low-level intermediate representation

TL;DR

on mature open-source C code, with the magnitude of improvement depending on the extent to which memory accesses can be cached, demonstrating practical benefits for verifying low-level memory manipulation programs.

Abstract

during SMT solving.

Paper Structure (9 sections, 2 theorems, 3 equations, 12 figures, 3 tables)

This paper contains 9 sections, 2 theorems, 3 equations, 12 figures, 3 tables.

Introduction
OSEA-IR language
VC Generation
Towards a practical machine
Evaluation
Related work
Conclusion
Appendix A
Appendix B

Key Result

Theorem 1

For all pointers in the register map, if the pointer is top-of-(borrow)stack and is owning or mutably borrowed then the pointer cache value is the same as the value of memory at address of the pointer. Formally, let $R$ be a register map, $M$ memory, and $S_B$ a borrow store. Then,

Figures (12)

Figure 1: Ownership instr. in OSEA-IR grammar, where R, P, and M are scalar registers, pointer registers, and memory registers respectively.
Figure 2: Example of $\mathcal{M}_{0}$ operation. Effect on register map ($R$), memory map ($M$), and borrow store ($S_{B}$) shown in pink.
Figure 3: Verification condition (VC) shown in yellow.
Figure 4: Definition of $sym$ for .
Figure 5: Definition of $sym$ for mutable borrow, die, and $tngle$ macro for entanglement.
...and 7 more figures

Theorems & Definitions (3)

Theorem 1: Cache equivalence
proof
Theorem 2

Ownership in low-level intermediate representation

TL;DR

Abstract

Ownership in low-level intermediate representation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (3)