Table of Contents
Fetching ...

VulMatch: Binary-level Vulnerability Detection Through Signature

Zian Liu, Lei Pan, Chao Chen, Ejaz Ahmed, Shigang Liu, Jun Zhang, Dongxi Liu

TL;DR

Vulnerability signatures in binary code are often confounded by code reuse and patch-focused methods that miss true vulnerability existence. VulMatch proposes a four-step pipeline that leverages source-to-binary mapping and context-aware binary signatures to detect vulnerability existence with higher precision and interpretability. It introduces add, delete, and change vulnerability signatures and corresponding patch signatures, built from source-level diffs and mapped to binary blocks with contextual control-flow information. Empirical results across seven open-source projects and real-world firmware show VulMatch outperforming Asm2vec and Palmtree in top-1 vulnerability detection and providing explanations that aid human analysts, indicating practical value for automated and manual vulnerability assessment.

Abstract

Similar vulnerability repeats in real-world software products because of code reuse, especially in wildly reused third-party code and libraries. Detecting repeating vulnerabilities like 1-day and N-day vulnerabilities is an important cyber security task. Unfortunately, the state-of-the-art methods suffer from poor performance because they detect patch existence instead of vulnerability existence and infer the vulnerability signature directly from binary code. In this paper, we propose VulMatch to extract precise vulnerability-related binary instructions to generate the vulnerability-related signature. VulMatch detects vulnerability existence based on binary signatures. Unlike previous approaches, VulMatch accurately locates vulnerability-related instructions by utilizing source and binary codes. Our experiments were conducted using over 1000 vulnerable instances across seven open-source projects. VulMatch significantly outperformed the baseline tools Asm2vec and Palmtree. Besides the performance advantages over the baseline tools, VulMatch offers a better feature by providing explainable reasons during vulnerability detection. Our empirical studies demonstrate that VulMatch detects fine-grained vulnerability that the state-of-the-art tools struggle with. Our experiment on commercial firmware demonstrates VulMatch is able to find vulnerabilities in real-world scenario.

VulMatch: Binary-level Vulnerability Detection Through Signature

TL;DR

Vulnerability signatures in binary code are often confounded by code reuse and patch-focused methods that miss true vulnerability existence. VulMatch proposes a four-step pipeline that leverages source-to-binary mapping and context-aware binary signatures to detect vulnerability existence with higher precision and interpretability. It introduces add, delete, and change vulnerability signatures and corresponding patch signatures, built from source-level diffs and mapped to binary blocks with contextual control-flow information. Empirical results across seven open-source projects and real-world firmware show VulMatch outperforming Asm2vec and Palmtree in top-1 vulnerability detection and providing explanations that aid human analysts, indicating practical value for automated and manual vulnerability assessment.

Abstract

Similar vulnerability repeats in real-world software products because of code reuse, especially in wildly reused third-party code and libraries. Detecting repeating vulnerabilities like 1-day and N-day vulnerabilities is an important cyber security task. Unfortunately, the state-of-the-art methods suffer from poor performance because they detect patch existence instead of vulnerability existence and infer the vulnerability signature directly from binary code. In this paper, we propose VulMatch to extract precise vulnerability-related binary instructions to generate the vulnerability-related signature. VulMatch detects vulnerability existence based on binary signatures. Unlike previous approaches, VulMatch accurately locates vulnerability-related instructions by utilizing source and binary codes. Our experiments were conducted using over 1000 vulnerable instances across seven open-source projects. VulMatch significantly outperformed the baseline tools Asm2vec and Palmtree. Besides the performance advantages over the baseline tools, VulMatch offers a better feature by providing explainable reasons during vulnerability detection. Our empirical studies demonstrate that VulMatch detects fine-grained vulnerability that the state-of-the-art tools struggle with. Our experiment on commercial firmware demonstrates VulMatch is able to find vulnerabilities in real-world scenario.
Paper Structure (25 sections, 1 equation, 8 figures, 5 tables)

This paper contains 25 sections, 1 equation, 8 figures, 5 tables.

Figures (8)

  • Figure 1: An example vulnerable function tftp_connect selected from CVE-2019-5482. (a) lists pre-patch source code, and (b) lists post-patch source code. Green lines are the patched source lines. Other lines remain intact across the two versions.
  • Figure 2: Corresponding binary code CFG of function tftp_connect presented in \ref{['fig:mot1']}. (a) refers to pre-patch version, and (b) refers to post-patch version. Block 1' is a modified block and blocks 3', 4', 5', and 6' are added blocks. Other blocks remain intact.
  • Figure 3: VulMatch consists of four steps: Data Preparation, Locating Signature Instructions, Constructing Context-aware Binary-level Signatures, and Signature Matching. Src is short for source code. Bin is short for binary code. Insn is short for instruction.
  • Figure 4: An example of a missing match between source code and binary code. The first two lines 1226 and 1228 do not have any mapping instructions in binary code because the assembly code does not need to specify the type information for functions and variables. Line 1230 maps to two different basic blocks. Line 1231 maps to one basic block. This example is extracted from openjpeg version 1.5.0.
  • Figure 5: Examples of add, delete and change types. Green lines are the newly added or changed instructions in the patched version. Red lines are the deleted or changed lines in the vulnerable version. Grey lines are the intact lines.
  • ...and 3 more figures