Table of Contents
Fetching ...

VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries

Nasir Hussain, Haohan Chen, Chanh Tran, Philip Huang, Zhuohao Li, Pravir Chugh, William Chen, Ashish Kundu, Yuan Tian

TL;DR

Vulnerabilities in stripped binaries are hard to detect due to information loss in decompilation and limited context for LLMs. The authors introduce Vul-BinLLM, an end-to-end framework that enhances decompiled code with vulnerability-focused syntactic information and employs an extended context memory plus a function queue to scale vulnerability reasoning beyond native context windows. By combining neural decompilation, prompt engineering (in-context learning and chain-of-thought), and a memory management agent, Vul-BinLLM achieves state-of-the-art performance on the Juliet C/C++ vulnerability suite compared to LATTE, with notable improvements in CWE classification accuracy. The work demonstrates the feasibility of LLM-powered binary vulnerability detection and suggests a scalable path toward automated security analysis of stripped binaries, with implications for faster vulnerability discovery in real-world software.

Abstract

Recognizing vulnerabilities in stripped binary files presents a significant challenge in software security. Although some progress has been made in generating human-readable information from decompiled binary files with Large Language Models (LLMs), effectively and scalably detecting vulnerabilities within these binary files is still an open problem. This paper explores the novel application of LLMs to detect vulnerabilities within these binary files. We demonstrate the feasibility of identifying vulnerable programs through a combined approach of decompilation optimization to make the vulnerabilities more prominent and long-term memory for a larger context window, achieving state-of-the-art performance in binary vulnerability analysis. Our findings highlight the potential for LLMs to overcome the limitations of traditional analysis methods and advance the field of binary vulnerability detection, paving the way for more secure software systems. In this paper, we present Vul-BinLLM , an LLM-based framework for binary vulnerability detection that mirrors traditional binary analysis workflows with fine-grained optimizations in decompilation and vulnerability reasoning with an extended context. In the decompilation phase, Vul-BinLLM adds vulnerability and weakness comments without altering the code structure or functionality, providing more contextual information for vulnerability reasoning later. Then for vulnerability reasoning, Vul-BinLLM combines in-context learning and chain-of-thought prompting along with a memory management agent to enhance accuracy. Our evaluations encompass the commonly used synthetic dataset Juliet to evaluate the potential feasibility for analysis and vulnerability detection in C/C++ binaries. Our evaluations show that Vul-BinLLM is highly effective in detecting vulnerabilities on the compiled Juliet dataset.

VulBinLLM: LLM-powered Vulnerability Detection for Stripped Binaries

TL;DR

Vulnerabilities in stripped binaries are hard to detect due to information loss in decompilation and limited context for LLMs. The authors introduce Vul-BinLLM, an end-to-end framework that enhances decompiled code with vulnerability-focused syntactic information and employs an extended context memory plus a function queue to scale vulnerability reasoning beyond native context windows. By combining neural decompilation, prompt engineering (in-context learning and chain-of-thought), and a memory management agent, Vul-BinLLM achieves state-of-the-art performance on the Juliet C/C++ vulnerability suite compared to LATTE, with notable improvements in CWE classification accuracy. The work demonstrates the feasibility of LLM-powered binary vulnerability detection and suggests a scalable path toward automated security analysis of stripped binaries, with implications for faster vulnerability discovery in real-world software.

Abstract

Recognizing vulnerabilities in stripped binary files presents a significant challenge in software security. Although some progress has been made in generating human-readable information from decompiled binary files with Large Language Models (LLMs), effectively and scalably detecting vulnerabilities within these binary files is still an open problem. This paper explores the novel application of LLMs to detect vulnerabilities within these binary files. We demonstrate the feasibility of identifying vulnerable programs through a combined approach of decompilation optimization to make the vulnerabilities more prominent and long-term memory for a larger context window, achieving state-of-the-art performance in binary vulnerability analysis. Our findings highlight the potential for LLMs to overcome the limitations of traditional analysis methods and advance the field of binary vulnerability detection, paving the way for more secure software systems. In this paper, we present Vul-BinLLM , an LLM-based framework for binary vulnerability detection that mirrors traditional binary analysis workflows with fine-grained optimizations in decompilation and vulnerability reasoning with an extended context. In the decompilation phase, Vul-BinLLM adds vulnerability and weakness comments without altering the code structure or functionality, providing more contextual information for vulnerability reasoning later. Then for vulnerability reasoning, Vul-BinLLM combines in-context learning and chain-of-thought prompting along with a memory management agent to enhance accuracy. Our evaluations encompass the commonly used synthetic dataset Juliet to evaluate the potential feasibility for analysis and vulnerability detection in C/C++ binaries. Our evaluations show that Vul-BinLLM is highly effective in detecting vulnerabilities on the compiled Juliet dataset.

Paper Structure

This paper contains 20 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Workflow of Vul-BinLLM : (1) Binary files are decompiled into source code, where an LLM-assisted decompiler enriches the code with contextual information for vulnerability detection. (2) The decompiled source code is then analyzed by Vul-BinLLM for vulnerability analysis which has an archival storage to store analysis, The analyzer then provides a comprehensive vulnerability detection result (3) VulBinQ: It features an additional queue that manages the functions that are to be analyzed using Vul-BinLLM and serves as middleware between the Archived Analysis and Vul-BinLLM .
  • Figure 2: An example of a buffer overflow vulnerability: the detection of this vulnerability relies on human expertise in security review. Human experts are able to detect the above vulnerability, but the subtley of the vulnerability may lead to difficulty in detection by an LLM.
  • Figure 3: An example of Matrix Multiplication decompilation output across different stages: the original source code, Ghidra, Ghidra enhanced with GPT-4o, and Vul-BinLLM . The Ghidra decompilation provides a low-level representation with generic variable names and lacks context, making the functionality and security aspects harder to interpret. Ghidra + GPT-4o improves readability with meaningful variable names and clarifying comments. Vul-BinLLM further augments the output by adding vulnerability-specific annotations, such as warnings about potential pointer arithmetic issues that could lead to buffer overflows or memory access vulnerabilities. This layered enhancement helps bridge the gap between decompilation and security analysis, making Vul-BinLLM particularly beneficial for identifying and understanding vulnerabilities in binary code.
  • Figure 4: Vul-BinLLM -Decompiler overview. Vul-BinLLM -decompiler includes an Optimization Decision Agent and three Action Agents (Vul-variable, Vul-struct, Vul-comment). After getting raw decompilation output from reverse engineering tool, Vul-BinLLM -decompiler will perform an initial check on the grammar, functionality, and structures and decide what optimization decision will be made. Be sending requests to Action Agents, Vul-BinLLM -decompiler will focus on renaming the variables and functions' names, reorganize the defined code structure, and critically, appending explanations on potentially vulnerability and functionalities attached the code. Vul-BinLLM -decompiler can also support to learn examples code by in-context learning
  • Figure 5: An example of binary classification with CWE-78. The LLM is required to respond with yes or no, when asked if it is concentrating on the code flow rather than semantics. To avoid memorization of LLMs in our special case, we append multiple CWEs (CWE-121: Stack Buffer Overflow, CWE-787: Out-of-bound Write)