Can Neural Decompilation Assist Vulnerability Prediction on Binary Code?
D. Cotroneo, F. C. Grasso, R. Natella, V. Orbinato
TL;DR
The paper presents a pipeline that enables vulnerability prediction directly on binary code by first neurally decompiling binaries into high-level C/C++ and then applying DL-based classifiers to the decompiled code. Using the Juliet dataset, the approach achieves a high-quality neural decompilation (ED around 59%) and strong vulnerability-prediction performance (bi-class F1 ≈ 0.95; multi-class accuracy ≈ 0.83 with CodeBERT). It demonstrates that decompiled source, treated as plain text, can surpass traditional disassembly-based or IR-focused methods for both detecting vulnerable programs and mapping them to MITRE CWE categories. The results suggest a practical, architecture-agnostic path for binary vulnerability analysis that leverages transformer-based decompilation and CWE-aligned labeling.
Abstract
Vulnerability prediction is valuable in identifying security issues efficiently, even though it requires the source code of the target software system, which is a restrictive hypothesis. This paper presents an experimental study to predict vulnerabilities in binary code without source code or complex representations of the binary, leveraging the pivotal idea of decompiling the binary file through neural decompilation and predicting vulnerabilities through deep learning on the decompiled source code. The results outperform the state-of-the-art in both neural decompilation and vulnerability prediction, showing that it is possible to identify vulnerable programs with this approach concerning bi-class (vulnerable/non-vulnerable) and multi-class (type of vulnerability) analysis.
