Table of Contents
Fetching ...

B-Side: Binary-Level Static System Call Identification

Gaspard Thévenon, Kevin Nguetchouang, Kahina Lazri, Alain Tchana, Pierre Olivier

TL;DR

B-Side is presented, a static binary analysis tool able to identify a superset of the system calls that an x86-64 static/dynamic executable may invoke at runtime, and shows a good degree of precision by leveraging symbolic execution, combined with a heuristic to detect system call wrappers, which represent an important source of precision loss in existing works.

Abstract

System call filtering is widely used to secure programs in multi-tenant environments, and to sandbox applications in modern desktop software deployment and package management systems. Filtering rules are hard to write and maintain manually, hence generating them automatically is essential. To that aim, analysis tools able to identify every system call that can legitimately be invoked by a program are needed. Existing static analysis works lack precision because of a high number of false positives, and/or assume the availability of program/libraries source code -- something unrealistic in many scenarios such as cloud production environments. We present B-Side, a static binary analysis tool able to identify a superset of the system calls that an x86-64 static/dynamic executable may invoke at runtime. B-Side assumes no access to program/libraries sources, and shows a good degree of precision by leveraging symbolic execution, combined with a heuristic to detect system call wrappers, which represent an important source of precision loss in existing works. B-Side also allows to statically detect phases of execution in a program in which different filtering rules can be applied. We validate B-Side and demonstrate its higher precision compared to state-of-the-art works: over a set of popular applications, B-Side's average $F_1$ score is 0.81, vs. 0.31 and 0.53 for competitors. Over 557 static and dynamically-compiled binaries taken from the Debian repositories, B-Side identifies an average of 43 system calls, vs. 271 and 95 for two state-of-the art competitors. We further evaluate the strictness of the phase-based filtering policies that can be obtained with B-Side.

B-Side: Binary-Level Static System Call Identification

TL;DR

B-Side is presented, a static binary analysis tool able to identify a superset of the system calls that an x86-64 static/dynamic executable may invoke at runtime, and shows a good degree of precision by leveraging symbolic execution, combined with a heuristic to detect system call wrappers, which represent an important source of precision loss in existing works.

Abstract

System call filtering is widely used to secure programs in multi-tenant environments, and to sandbox applications in modern desktop software deployment and package management systems. Filtering rules are hard to write and maintain manually, hence generating them automatically is essential. To that aim, analysis tools able to identify every system call that can legitimately be invoked by a program are needed. Existing static analysis works lack precision because of a high number of false positives, and/or assume the availability of program/libraries source code -- something unrealistic in many scenarios such as cloud production environments. We present B-Side, a static binary analysis tool able to identify a superset of the system calls that an x86-64 static/dynamic executable may invoke at runtime. B-Side assumes no access to program/libraries sources, and shows a good degree of precision by leveraging symbolic execution, combined with a heuristic to detect system call wrappers, which represent an important source of precision loss in existing works. B-Side also allows to statically detect phases of execution in a program in which different filtering rules can be applied. We validate B-Side and demonstrate its higher precision compared to state-of-the-art works: over a set of popular applications, B-Side's average score is 0.81, vs. 0.31 and 0.53 for competitors. Over 557 static and dynamically-compiled binaries taken from the Debian repositories, B-Side identifies an average of 43 system calls, vs. 271 and 95 for two state-of-the art competitors. We further evaluate the strictness of the phase-based filtering policies that can be obtained with B-Side.

Paper Structure

This paper contains 36 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Scenarios in which the system call type defining immediate is: in the same basic block as the syscall instruction (A); in a different basic block (B); and with the immediate value propagated through memory on the stack (C).
  • Figure 2: A function with many predecessors between the immediate definition and the syscall instruction make symbolic exploration difficult (A); a system call wrapper function leading to a high over-estimation of the system call set (B).
  • Figure 3: Overview of B-Side's system call identification process divided into 3 main steps.
  • Figure 4: We use active addresses taken, i.e. addresses operands of lea instructions reachable from the program's entry point, to overestimate the list of indirect jump targets.
  • Figure 5: System call identification process: starting from a call site, predecessors are selected in BFS mode one after the other as the start node of a forward symbolic execution search. This process continues until all reachable nodes for which the symbolic execution is able to determine the system call type have been found.
  • ...and 4 more figures