Table of Contents
Fetching ...

Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection

Litao Li, Steven H. H. Ding, Andrew Walenstein, Philippe Charland, Benjamin C. M. Fung

TL;DR

The paper tackles binary-code vulnerability detection by addressing CFG overestimation and limited global context in traditional graph models. It introduces DeepEXE, an agent-based implicit neural network that simulates dynamic program execution on CFGs, guided by program state and reinforced branching decisions, with equilibrium-based implicit GNNs and Anderson acceleration for training. Across semi-synthetic and real-world datasets, DeepEXE outperforms state-of-the-art baselines, demonstrating improved accuracy and AUC while handling large graphs with a near-unlimited receptive field. The approach offers a practical and scalable framework for binary vulnerability detection, with potential extensions to other security tasks and domains that require modeling complex execution dynamics.

Abstract

Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it provides end-to-end and accurate prediction. Modern deep learning approaches learn the program semantics through sequence and graph neural networks, using various intermediate representation of programs, such as abstract syntax trees (AST) or control flow graphs (CFG). Due to the complex nature of program execution, the output of an execution depends on the many program states and inputs. Also, a CFG generated from static analysis can be an overestimation of the true program flow. Moreover, the size of programs often does not allow a graph neural network with fixed layers to aggregate global information. To address these issues, we propose DeepEXE, an agent-based implicit neural network that mimics the execution path of a program. We use reinforcement learning to enhance the branching decision at every program state transition and create a dynamic environment to learn the dependency between a vulnerability and certain program states. An implicitly defined neural network enables nearly infinite state transitions until convergence, which captures the structural information at a higher level. The experiments are conducted on two semi-synthetic and two real-world datasets. We show that DeepEXE is an accurate and efficient method and outperforms the state-of-the-art vulnerability detection methods.

Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection

TL;DR

The paper tackles binary-code vulnerability detection by addressing CFG overestimation and limited global context in traditional graph models. It introduces DeepEXE, an agent-based implicit neural network that simulates dynamic program execution on CFGs, guided by program state and reinforced branching decisions, with equilibrium-based implicit GNNs and Anderson acceleration for training. Across semi-synthetic and real-world datasets, DeepEXE outperforms state-of-the-art baselines, demonstrating improved accuracy and AUC while handling large graphs with a near-unlimited receptive field. The approach offers a practical and scalable framework for binary vulnerability detection, with potential extensions to other security tasks and domains that require modeling complex execution dynamics.

Abstract

Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it provides end-to-end and accurate prediction. Modern deep learning approaches learn the program semantics through sequence and graph neural networks, using various intermediate representation of programs, such as abstract syntax trees (AST) or control flow graphs (CFG). Due to the complex nature of program execution, the output of an execution depends on the many program states and inputs. Also, a CFG generated from static analysis can be an overestimation of the true program flow. Moreover, the size of programs often does not allow a graph neural network with fixed layers to aggregate global information. To address these issues, we propose DeepEXE, an agent-based implicit neural network that mimics the execution path of a program. We use reinforcement learning to enhance the branching decision at every program state transition and create a dynamic environment to learn the dependency between a vulnerability and certain program states. An implicitly defined neural network enables nearly infinite state transitions until convergence, which captures the structural information at a higher level. The experiments are conducted on two semi-synthetic and two real-world datasets. We show that DeepEXE is an accurate and efficient method and outperforms the state-of-the-art vulnerability detection methods.
Paper Structure (17 sections, 18 equations, 2 figures, 4 tables)

This paper contains 17 sections, 18 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: In each epoch, the model simulates one execution session with a specific execution path consisting of multiple steps. At step $i$, the executor chooses the most likely branch for node $j$ to move next based on program state and node semantics. This is one execution with a loop (Epoch 1) and one without (Epoch 2). The model then updates the program state by combining the next node's code semantics.
  • Figure 2: Overall architecture of DeepEXE. Four major segments of our model include input preprocessing, node embedding through sequential model, state transition and structure learning, and prediction and training. DeepEXE combines the local instruction semantics with high-level topological information, where program dependencies are captured through the use of a REINFORCE agent and a GNN with much larger receptive field.