PS$^3$: Precise Patch Presence Test based on Semantic Symbolic Signature
Qi Zhan, Xing Hu, Zhiyang Li, Xin Xia, David Lo, Shanping Li
TL;DR
PS3 addresses the challenge of detecting whether a known vulnerability patch is present in a target binary across diverse compiler options. It introduces semantic-level signatures derived from symbolic emulation and a theorem-prover based matching engine to test patch presence with high precision and robustness. Evaluated on a dataset of 3,631 CVE-binary pairs from four projects, PS3 achieves an $F1$ score of $0.89$, outperforming state-of-the-art baselines and showing resilience to compiler and optimization variations. The work demonstrates practical impact for large-scale software security by enabling precise, reusable patch presence checks without relying on identical build configurations, and provides public code and data for further research.
Abstract
During software development, vulnerabilities have posed a significant threat to users. Patches are the most effective way to combat vulnerabilities. In a large-scale software system, testing the presence of a security patch in every affected binary is crucial to ensure system security. Identifying whether a binary has been patched for a known vulnerability is challenging, as there may only be small differences between patched and vulnerable versions. Existing approaches mainly focus on detecting patches that are compiled in the same compiler options. However, it is common for developers to compile programs with very different compiler options in different situations, which causes inaccuracy for existing methods. In this paper, we propose a new approach named PS3, referring to precise patch presence test based on semantic-level symbolic signature. PS3 exploits symbolic emulation to extract signatures that are stable under different compiler options. Then PS3 can precisely test the presence of the patch by comparing the signatures between the reference and the target at semantic level. To evaluate the effectiveness of our approach, we constructed a dataset consisting of 3,631 (CVE, binary) pairs of 62 recent CVEs in four C/C++ projects. The experimental results show that PS3 achieves scores of 0.82, 0.97, and 0.89 in terms of precision, recall, and F1 score, respectively. PS3 outperforms the state-of-the-art baselines by improving 33% in terms of F1 score and remains stable in different compiler options.
