Table of Contents
Fetching ...

Formally Verified Binary-level Pointer Analysis

Freek Verbeek, Ali Shokri, Daniel Engel, Binoy Ravindran

TL;DR

This work tackles the challenge of binary-level pointer analysis by proposing a formally proven, over-approximative approach whose core is a polymorphic abstract-domain framework. By deriving eight generic proof obligations and proving them in Isabelle/HOL, the authors enable easy instantiation with different domains balancing precision and scalability. They instantiate three domains—Constant Computations (C), Bases (B), and Sources (S)—and demonstrate context-sensitive, interprocedural analysis with soundness and practical precision on large binary datasets. The evaluation shows strong recall and competitive precision, with scalable performance up to hundreds of thousands of instructions, suggesting significant practical impact for disassembly, decompilation, and verification tasks. Overall, the paper provides a rigorous, configurable foundation for binary pointer analysis that can support robust, automated binary analysis pipelines.

Abstract

Binary-level pointer analysis can be of use in symbolic execution, testing, verification, and decompilation of software binaries. In various such contexts, it is crucial that the result is trustworthy, i.e., it can be formally established that the pointer designations are overapproximative. This paper presents an approach to formally proven correct binary-level pointer analysis. A salient property of our approach is that it first generically considers what proof obligations a generic abstract domain for pointer analysis must satisfy. This allows easy instantiation of different domains, varying in precision, while preserving the correctness of the analysis. In the trade-off between scalability and precision, such customization allows "meaningful" precision (sufficiently precise to ensure basic sanity properties, such as that relevant parts of the stack frame are not overwritten during function execution) while also allowing coarse analysis when pointer computations have become too obfuscated during compilation for sound and accurate bounds analysis. We experiment with three different abstract domains with high, medium, and low precision. Evaluation shows that our approach is able to derive designations for memory writes soundly in COTS binaries, in a context-sensitive interprocedural fashion.

Formally Verified Binary-level Pointer Analysis

TL;DR

This work tackles the challenge of binary-level pointer analysis by proposing a formally proven, over-approximative approach whose core is a polymorphic abstract-domain framework. By deriving eight generic proof obligations and proving them in Isabelle/HOL, the authors enable easy instantiation with different domains balancing precision and scalability. They instantiate three domains—Constant Computations (C), Bases (B), and Sources (S)—and demonstrate context-sensitive, interprocedural analysis with soundness and practical precision on large binary datasets. The evaluation shows strong recall and competitive precision, with scalable performance up to hundreds of thousands of instructions, suggesting significant practical impact for disassembly, decompilation, and verification tasks. Overall, the paper provides a rigorous, configurable foundation for binary pointer analysis that can support robust, automated binary analysis pipelines.

Abstract

Binary-level pointer analysis can be of use in symbolic execution, testing, verification, and decompilation of software binaries. In various such contexts, it is crucial that the result is trustworthy, i.e., it can be formally established that the pointer designations are overapproximative. This paper presents an approach to formally proven correct binary-level pointer analysis. A salient property of our approach is that it first generically considers what proof obligations a generic abstract domain for pointer analysis must satisfy. This allows easy instantiation of different domains, varying in precision, while preserving the correctness of the analysis. In the trade-off between scalability and precision, such customization allows "meaningful" precision (sufficiently precise to ensure basic sanity properties, such as that relevant parts of the stack frame are not overwritten during function execution) while also allowing coarse analysis when pointer computations have become too obfuscated during compilation for sound and accurate bounds analysis. We experiment with three different abstract domains with high, medium, and low precision. Evaluation shows that our approach is able to derive designations for memory writes soundly in COTS binaries, in a context-sensitive interprocedural fashion.

Paper Structure

This paper contains 13 sections, 2 theorems, 13 equations, 6 figures, 2 tables.

Key Result

Theorem 1

Function $\gamma_\mathbb{S}$ is a simulation relationbaier2008principles between the concrete and abstract semantics:

Figures (6)

  • Figure 1: Examples of Proof Obligations
  • Figure 2: Assembly code. For sake of presentation, pseudo code is given on the right. $\mathit{offset}$ denotes some dynamically computed offset. Instruction lea loads an address into register rsi without reading from memory. The binary has a data section with address range $\mathtt{0x2000}$ to $\mathtt{0x2040}$.
  • Figure 3: Separation over abstract pointers: the smallest symmetric relation such that the above holds.
  • Figure 4: Example of Indirect Call
  • Figure 5: Assembly code, decompiled to code with variables based on abstract pointers.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Example 1
  • Example 2
  • Example 3
  • Definition 1