Formally Verified Binary-level Pointer Analysis
Freek Verbeek, Ali Shokri, Daniel Engel, Binoy Ravindran
TL;DR
This work tackles the challenge of binary-level pointer analysis by proposing a formally proven, over-approximative approach whose core is a polymorphic abstract-domain framework. By deriving eight generic proof obligations and proving them in Isabelle/HOL, the authors enable easy instantiation with different domains balancing precision and scalability. They instantiate three domains—Constant Computations (C), Bases (B), and Sources (S)—and demonstrate context-sensitive, interprocedural analysis with soundness and practical precision on large binary datasets. The evaluation shows strong recall and competitive precision, with scalable performance up to hundreds of thousands of instructions, suggesting significant practical impact for disassembly, decompilation, and verification tasks. Overall, the paper provides a rigorous, configurable foundation for binary pointer analysis that can support robust, automated binary analysis pipelines.
Abstract
Binary-level pointer analysis can be of use in symbolic execution, testing, verification, and decompilation of software binaries. In various such contexts, it is crucial that the result is trustworthy, i.e., it can be formally established that the pointer designations are overapproximative. This paper presents an approach to formally proven correct binary-level pointer analysis. A salient property of our approach is that it first generically considers what proof obligations a generic abstract domain for pointer analysis must satisfy. This allows easy instantiation of different domains, varying in precision, while preserving the correctness of the analysis. In the trade-off between scalability and precision, such customization allows "meaningful" precision (sufficiently precise to ensure basic sanity properties, such as that relevant parts of the stack frame are not overwritten during function execution) while also allowing coarse analysis when pointer computations have become too obfuscated during compilation for sound and accurate bounds analysis. We experiment with three different abstract domains with high, medium, and low precision. Evaluation shows that our approach is able to derive designations for memory writes soundly in COTS binaries, in a context-sensitive interprocedural fashion.
