Table of Contents
Fetching ...

PPT4J: Patch Presence Test for Java Binaries

Zhiyuan Pan, Xing Hu, Xin Xia, Xian Zhan, David Lo, Xiaohu Yang

TL;DR

The paper addresses the problem of verifying whether security patches are present in Java binaries, addressing software supply chain risks from vulnerable libraries. It introduces PPT4J, which links source-code semantic changes to binary features to perform patch presence testing via a boolean test $f: (C_1, C_2, P, B) \rightarrow \{True, False\}$, where $B$ is Java bytecode and at least two of $C_1$, $C_2$, $P$ are provided. PPT4J extracts semantic changes from patches and uses them to guide source-to-binary feature matching and feature queries, producing a verdict that confirms or denies patch presence. On a dataset of 110 vulnerabilities, PPT4J achieves F1 = 98.5% and improves the baseline by 14.2% while maintaining reasonable efficiency, and in-the-wild evaluation on JetBrains IntelliJ IDEA demonstrates practical applicability and reveals unpatched third-party libraries; replication materials are released to support future work.

Abstract

The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In practice, it is difficult to identify whether patches have been integrated into software, especially if we only have binary files. Therefore, the ability to test whether a patch is applied to the target binary, a.k.a. patch presence test, is crucial for practitioners. However, it is challenging to obtain accurate semantic information from patches, which could lead to incorrect results. In this paper, we propose a new patch presence test framework named PPT4J ($\textbf{P}$atch $\textbf{P}$resence $\textbf{T}$est $\textbf{for}$ $\textbf{J}$ava Binaries). PPT4J is designed for open-source Java libraries. It takes Java binaries (i.e. bytecode files) as input, extracts semantic information from patches, and uses feature-based techniques to identify patch lines in the binaries. To evaluate the effectiveness of our proposed approach PPT4J, we construct a dataset with binaries that include 110 vulnerabilities. The results show that PPT4J achieves an F1 score of 98.5% with reasonable efficiency, improving the baseline by 14.2%. Furthermore, we conduct an in-the-wild evaluation of PPT4J on JetBrains IntelliJ IDEA. The results suggest that a third-party library included in the software is not patched for two CVEs, and we have reported this potential security problem to the vendor.

PPT4J: Patch Presence Test for Java Binaries

TL;DR

The paper addresses the problem of verifying whether security patches are present in Java binaries, addressing software supply chain risks from vulnerable libraries. It introduces PPT4J, which links source-code semantic changes to binary features to perform patch presence testing via a boolean test , where is Java bytecode and at least two of , , are provided. PPT4J extracts semantic changes from patches and uses them to guide source-to-binary feature matching and feature queries, producing a verdict that confirms or denies patch presence. On a dataset of 110 vulnerabilities, PPT4J achieves F1 = 98.5% and improves the baseline by 14.2% while maintaining reasonable efficiency, and in-the-wild evaluation on JetBrains IntelliJ IDEA demonstrates practical applicability and reveals unpatched third-party libraries; replication materials are released to support future work.

Abstract

The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In practice, it is difficult to identify whether patches have been integrated into software, especially if we only have binary files. Therefore, the ability to test whether a patch is applied to the target binary, a.k.a. patch presence test, is crucial for practitioners. However, it is challenging to obtain accurate semantic information from patches, which could lead to incorrect results. In this paper, we propose a new patch presence test framework named PPT4J (atch resence est ava Binaries). PPT4J is designed for open-source Java libraries. It takes Java binaries (i.e. bytecode files) as input, extracts semantic information from patches, and uses feature-based techniques to identify patch lines in the binaries. To evaluate the effectiveness of our proposed approach PPT4J, we construct a dataset with binaries that include 110 vulnerabilities. The results show that PPT4J achieves an F1 score of 98.5% with reasonable efficiency, improving the baseline by 14.2%. Furthermore, we conduct an in-the-wild evaluation of PPT4J on JetBrains IntelliJ IDEA. The results suggest that a third-party library included in the software is not patched for two CVEs, and we have reported this potential security problem to the vendor.
Paper Structure (3 sections, 1 equation, 1 figure)

This paper contains 3 sections, 1 equation, 1 figure.

Theorems & Definitions (1)

  • Definition