Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

Jiyong Uhm; Minseok Kim; Michalis Polychronakis; Hyungjoon Koo

Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

Jiyong Uhm, Minseok Kim, Michalis Polychronakis, Hyungjoon Koo

TL;DR

This work investigates the robustness of machine-learning–based binary code similarity detection (BCSD) against semantics-preserving transformations. It introduces asmFooler, a system that generates 9,565 adversarial binary variants from 620 baselines using eight transformations across six BCSD representations, enabling a comprehensive robustness evaluation. The study finds that robustness strongly depends on the preprocessing pipeline, architecture, and feature choices, with transformation budgets and instruction expressivity constraining attack efficacy; FP-trigger perturbations can achieve near-perfect success rates and transfer across similar architectures, while explainable AI reveals how these perturbations distort internal decision mechanisms. The results highlight the need for diverse features and adversarial-aware training to build more resilient BCSD systems, and the authors provide open-source tooling and data to support future research.

Abstract

Binary code analysis plays an essential role in cybersecurity, facilitating reverse engineering to reveal the inner workings of programs in the absence of source code. Traditional approaches, such as static and dynamic analysis, extract valuable insights from stripped binaries, but often demand substantial expertise and manual effort. Recent advances in deep learning have opened promising opportunities to enhance binary analysis by capturing latent features and disclosing underlying code semantics. Despite the growing number of binary analysis models based on machine learning, their robustness to adversarial code transformations at the binary level remains underexplored. We evaluate the robustness of deep learning models for the task of binary code similarity detection (BCSD) under semantics-preserving transformations. The unique nature of machine instructions presents distinct challenges compared to the typical input perturbations found in other domains. We introduce asmFooler, a system that evaluates the resilience of BCSD models using a diverse set of adversarial code transformations that preserve functional semantics. We construct a dataset of 9,565 binary variants from 620 baseline samples by applying eight semantics-preserving transformations across six representative BCSD models. Our major findings highlight several key insights: i) model robustness relies on the processing pipeline, including code pre-processing, architecture, and feature selection; ii) adversarial transformation effectiveness is bounded by a budget shaped by model-specific constraints like input size and instruction expressive capacity; iii) well-crafted transformations can be highly effective with minimal perturbations; and iv) such transformations efficiently disrupt model decisions (e.g., misleading to false positives or false negatives) by focusing on semantically significant instructions.

Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

TL;DR

Abstract

Paper Structure (26 sections, 9 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 9 figures, 5 tables, 1 algorithm.

Introduction
Background
Semantics-preserving Code Transformations
Binary Code Similarity Detection
Robustness of ML-based BCSD Models
asmFooler Design
Generating Binary Variants
Code Diversification Techniques
Code Obfuscation Techniques
Evaluating the Robustness of BCSD Models
FN-triggering Perturbation
FP-triggering Perturbation
Transferability of FP-Triggers
Code Perturbation Examples
Implementation
...and 11 more sections

Figures (9)

Figure 1: Overview of the asmFooler system with two main components: generating binary variants with various semantics-preserving code transformations (Section \ref{['ss:variants']}) and evaluating the robustness of six pre-selected BCSD models (Section \ref{['ss:robustness']}). We assess the robustness of the models using adversarial samples designed to trigger either false negatives (FN) or false positives (FP). Additionally, we investigate the transferability of FP-trigger samples: i.e., a sample that misleads one model affects others. Note that M1 to M6 denote six BCSD models in our study.
Figure 2: Adversarial semantics-preserving code transformations in asmFooler. We adopt a wide range of transformation techniques from i) code diversification for defending against code reuse attacks (Section \ref{['sss:diversification']}) and ii) code obfuscation for making static analysis challenging (Section \ref{['sss:obfuscation']}).
Figure 3: Example of a semantic NOP sequence with the context-free grammar from Lucas et al.lucas2021malware. The chunk of instructions would not impact the semantics of a program as a subsequent instruction(s) counteract the side effects of one or more instructions ahead. For example, pushing the value of r10 to the stack (Line 3) and adding an arbitrary value (Line 4) can be reversed by pop r10 (Line 11).
Figure 4: Example of an FP-triggering perturbation applied to the initialize_eval function from the sjeng binary in SPEC2006. The instructions at addresses 0x8059 to 0x807d (Lines 4–9) represent the adversarial sequence inserted into the function prologue. To preserve original semantics, the instructions at 0x8050 and 0x8053 redirect the control flow to skip over the injected code. Note that we insert NOP padding as a length budget at the function prologue ( e.g., addresses between 0x8050 and 0x830c in this example), followed by injecting the FP-triggering code to enable further variations. Thus, the space between jne 0x8090 and push %rbp is filled with NOP instructions, keeping the original semantics intact.
Figure 5: Precision, Recall, and F1 of various BCSD models with a given transformation budget ( e.g., size of a semantic NOP in bytes). The models adopted a graph neural network like Genius feng2016scalable and Gemini xu2017neural tend to be robust against semantic NOP implantation, while the others do not ( e.g., significant drops in recall). We discuss precision with FP-triggering perturbation in Section \ref{['ss:RQ3']}. Note that a byte does not necessarily correspond to a single token or instruction; e.g., our experiments insert around 7 and 25 instructions on average under the budgets of 20 and 100 bytes, respectively.
...and 4 more figures

Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

TL;DR

Abstract

Fool Me If You Can: On the Robustness of Binary Code Similarity Detection Models against Semantics-preserving Transformations

Authors

TL;DR

Abstract

Table of Contents

Figures (9)