Table of Contents
Fetching ...

EFACT: an External Function Auto-Completion Tool to Strengthen Static Binary Lifting

Yilei Zhang, Haoyu Liao, Zekun Wang, Bo Huang, Jianmei Guo

TL;DR

EFACT targets the External Function Completion (EXFC) problem in static binary lifting, where missing or mangled declarations from external libraries impair analysis and translation. It introduces a two-phase framework: External Function Extraction to identify EXFs and platform-specific context, and External Function Completion to generate complete declarations via MFC, FPC, and VPC solvers, aided by a Dict auto-generator and a Library Database. EFACT demonstrates 100% EXF coverage on SPEC CPU 2017 libraries (glibc, libstdc++, libgfortran) and delivers substantial gains in mangled-function return-type recovery (up to ~97% over baseline tools) and cross-ISA translation when integrated with McSema (EFACT_MC shows 36.7% and 93.6% improvements on x86-64 → x86-64 and x86-64 → AArch64, respectively). The approach supports multiple languages (C, C++, Fortran, Rust) and libraries (OpenSSL), and provides outputs in LLVM IR and C/C++ for broad applicability, making EXFC more robust and scalable in real-world static binary rewriting tasks.

Abstract

Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in static binary translation, which eventually results in program crashes. Notably, existing tools struggle to recover the prototypes of mangled EXFs originating from binaries compiled from C++. Moreover, they require time-consuming manual processing to support new libraries. This paper presents EFACT, an External Function Auto-Completion Tool for static binary lifting. Our EXF recovery algorithm better recovers the prototypes of mangled EXFs, particularly addressing the template specialization mechanism in C++. EFACT is designed as a lightweight plugin to strengthen other static binary rewriting frameworks in EXFC. Our evaluation shows that EFACT outperforms RetDec and McSema in mangled EXF recovery by 96.4% and 97.3% on SPEC CPU 2017. Furthermore, we delve deeper into static binary translation and address several cross-ISA EXFC problems. When integrated with McSema, EFACT correctly translates 36.7% more benchmarks from x86-64 to x86-64 and 93.6% more from x86-64 to AArch64 than McSema alone on EEMBC.

EFACT: an External Function Auto-Completion Tool to Strengthen Static Binary Lifting

TL;DR

EFACT targets the External Function Completion (EXFC) problem in static binary lifting, where missing or mangled declarations from external libraries impair analysis and translation. It introduces a two-phase framework: External Function Extraction to identify EXFs and platform-specific context, and External Function Completion to generate complete declarations via MFC, FPC, and VPC solvers, aided by a Dict auto-generator and a Library Database. EFACT demonstrates 100% EXF coverage on SPEC CPU 2017 libraries (glibc, libstdc++, libgfortran) and delivers substantial gains in mangled-function return-type recovery (up to ~97% over baseline tools) and cross-ISA translation when integrated with McSema (EFACT_MC shows 36.7% and 93.6% improvements on x86-64 → x86-64 and x86-64 → AArch64, respectively). The approach supports multiple languages (C, C++, Fortran, Rust) and libraries (OpenSSL), and provides outputs in LLVM IR and C/C++ for broad applicability, making EXFC more robust and scalable in real-world static binary rewriting tasks.

Abstract

Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in static binary translation, which eventually results in program crashes. Notably, existing tools struggle to recover the prototypes of mangled EXFs originating from binaries compiled from C++. Moreover, they require time-consuming manual processing to support new libraries. This paper presents EFACT, an External Function Auto-Completion Tool for static binary lifting. Our EXF recovery algorithm better recovers the prototypes of mangled EXFs, particularly addressing the template specialization mechanism in C++. EFACT is designed as a lightweight plugin to strengthen other static binary rewriting frameworks in EXFC. Our evaluation shows that EFACT outperforms RetDec and McSema in mangled EXF recovery by 96.4% and 97.3% on SPEC CPU 2017. Furthermore, we delve deeper into static binary translation and address several cross-ISA EXFC problems. When integrated with McSema, EFACT correctly translates 36.7% more benchmarks from x86-64 to x86-64 and 93.6% more from x86-64 to AArch64 than McSema alone on EEMBC.
Paper Structure (37 sections, 12 figures, 13 tables)

This paper contains 37 sections, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Motivating examples of EXFC (source code is unknown in actual scenarios, we put it here to better explain the example).
  • Figure 2: Layers of binary lifting.
  • Figure 3: An overview of EFACT's auto-completion workflow (Func: Function).
  • Figure 4: An abstraction of the detailed implementation of va_list on x86-64 and AArch64 (with GCC 9.4, Ubuntu 20.04).
  • Figure 5: Library Database framework.
  • ...and 7 more figures

Theorems & Definitions (2)

  • definition 1
  • definition 2