Ownership in low-level intermediate representation
Siddharth Priya, Arie Gurfinkel
TL;DR
This work extends ownership semantics to low-level intermediate representations by introducing OSEA-IR, a fat-pointer cache mechanism, and prophecy-based modeling to handle mutable borrows, enabling cache-based verification instead of full memory-model reasoning. It provides a practical path toward low-level verification by adding multiword memory support, unique pointers, and cache-control operators, while preserving a formal borrow-stack discipline. The VC generation leverages a sym-based translation with prophecy variables to ensure cache equivalence between pointer caches and memory, and is implemented in SeaBMC with a macro-based C API that translates ownership-annotated code to OSEA-IR for SMT solving. Empirically, the approach yields speedups of $1.3x-5x$ on mature open-source C code, with the magnitude of improvement depending on the extent to which memory accesses can be cached, demonstrating practical benefits for verifying low-level memory manipulation programs.
Abstract
The concept of ownership in high level languages can aid both the programmer and the compiler to reason about the validity of memory operations. Previously, ownership semantics has been used successfully in high level automatic program verification to model a reference to data by a first order logic (FOL) representation of data instead of maintaining an address map. However, ownership semantics is not used in low level program verification. We have identified two challenges. First, ownership information is lost when a program is compiled to a low level intermediate representation (e.g., in LLVM IR). Second, pointers in low level programs point to bytes using an address map (e.g., in unsafe Rust) and thus the verification condition (VC) cannot always replace a pointer by its FOL abstraction. To remedy the situation, we develop ownership semantics for an LLVM like low level intermediate representation. Using these semantics, the VC can opportunistically model some memory accesses by a direct access of a pointer cache that stores byte representation of data. This scheme reduces instances where an address map must be maintained, especially for mostly safe programs that follow ownership semantics. For unsafe functionality, memory accesses are modelled by operations on an address map and we provide mechanisms to keep the address map and pointer cache in sync. We implement these semantics in SEABMC, a bit precise bounded model checker for LLVM. For evaluation, the source programs are assumed to be written in C. Since C does not have ownership built in, suitable macros are added that introduce and preserve ownership during translation to LLVM like IR for verification. This approach is evaluated on mature open source C code. For both handcrafted benchmarks and practical programs, we observe a speedup of $1.3x-5x$ during SMT solving.
