Table of Contents
Fetching ...

Accurate and Extensible Symbolic Execution of Binary Code based on Formal ISA Semantics

Sören Tempel, Tobias Brandt, Christoph Lüth, Christian Dietrich, Rolf Drechsler

TL;DR

This paper argues that symbolic execution of binary code benefits from an ISA-focused intermediate layer built from formal, machine-readable ISA semantics. It introduces BinSym, a $32$-bit RISC-V symbolic interpreter that lowers binary instructions to SMT by translating ISA primitives (e.g., WriteRegister) rather than hand-crafted IR, using LibRISCV as an executable specification and Z3 as the SMT backend. The approach enables easy extension to custom instructions (demonstrated with a MADD extension) and yields a compact implementation (~$1000$ LOC in Haskell) with competitive performance. Empirical evaluation on RIOT modules and benchmarks shows BinSym achieves coverage comparable to existing engines and uncovers five previously unknown lifter bugs in angr, underscoring the benefits of formal ISA semantics for correctness, extensibility, and reliability.

Abstract

Symbolic execution is an SMT-based software verification and testing technique. Symbolic execution requires tracking performed computations during software simulation to reason about branches in the software under test. The prevailing approach on symbolic execution of binary code tracks computations by transforming the code to be tested to an architecture-independent IR and then symbolically executes this IR. However, the resulting IR must be semantically equivalent to the binary code, making this process complex and error-prone. The semantics of the binary code are specified by the targeted ISA, commonly given in natural language and requiring a manual implementation of the transformation to an IR. In recent years, the use of formal languages to describe ISA semantics in a machine-readable way has gained increased popularity. We investigate the utilization of such formal semantics for symbolic execution of binary code, achieving an accurate representation of instruction semantics. We present a prototype for the RISC-V ISA and conduct a case study to demonstrate that it can be easily extended to additional instructions. Furthermore, we perform an experimental comparison with prior work which resulted in the discovery of five previously unknown bugs in the ISA implementation of the popular IR-based symbolic executor angr.

Accurate and Extensible Symbolic Execution of Binary Code based on Formal ISA Semantics

TL;DR

This paper argues that symbolic execution of binary code benefits from an ISA-focused intermediate layer built from formal, machine-readable ISA semantics. It introduces BinSym, a -bit RISC-V symbolic interpreter that lowers binary instructions to SMT by translating ISA primitives (e.g., WriteRegister) rather than hand-crafted IR, using LibRISCV as an executable specification and Z3 as the SMT backend. The approach enables easy extension to custom instructions (demonstrated with a MADD extension) and yields a compact implementation (~ LOC in Haskell) with competitive performance. Empirical evaluation on RIOT modules and benchmarks shows BinSym achieves coverage comparable to existing engines and uncovers five previously unknown lifter bugs in angr, underscoring the benefits of formal ISA semantics for correctness, extensibility, and reliability.

Abstract

Symbolic execution is an SMT-based software verification and testing technique. Symbolic execution requires tracking performed computations during software simulation to reason about branches in the software under test. The prevailing approach on symbolic execution of binary code tracks computations by transforming the code to be tested to an architecture-independent IR and then symbolically executes this IR. However, the resulting IR must be semantically equivalent to the binary code, making this process complex and error-prone. The semantics of the binary code are specified by the targeted ISA, commonly given in natural language and requiring a manual implementation of the transformation to an IR. In recent years, the use of formal languages to describe ISA semantics in a machine-readable way has gained increased popularity. We investigate the utilization of such formal semantics for symbolic execution of binary code, achieving an accurate representation of instruction semantics. We present a prototype for the RISC-V ISA and conduct a case study to demonstrate that it can be easily extended to additional instructions. Furthermore, we perform an experimental comparison with prior work which resulted in the discovery of five previously unknown bugs in the ISA implementation of the popular IR-based symbolic executor angr.
Paper Structure (13 sections, 6 figures, 1 table)

This paper contains 13 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Different translation methodologies for symbolic execution.
  • Figure 2: Generation of an SMT solver query for an exemplary branch condition in a SUT (in binary form) based on a formal ISA specification.
  • Figure 3: YAML riscv-opcodes description of a custom MADD instruction.
  • Figure 4: Description of the custom MADD semantics in LibRISCV.
  • Figure 5: Example code which results in false-positive and false-negative when analyzed using the existing angr SE engine due to a bug in the RISC-V lifter.
  • ...and 1 more figures