A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests

Emanuele Iannone; Quang-Cuong Bui; Riccardo Scandariato

A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests

Emanuele Iannone, Quang-Cuong Bui, Riccardo Scandariato

TL;DR

VuTeCo presents the first fully static framework to locate vulnerability-witnessing unit tests and link them to their witnessed vulnerabilities in Java projects. It defines two binary-classification tasks, Finding and Matching, implemented with UniXcoder and DeepSeek Coder, respectively, and leverages a tool-assisted input pipeline to harvest tests and CVE descriptions. Through extensive experiments on Vul4J and a large in-the-wild deployment over 427 projects, VuTeCo achieves $F_{0.5}$ scores up to $0.73$ for Finding and $0.65$ for Matching, and yields 224 confirmed security-related tests plus 35 correct matches, culminating in the Test4Vul dataset. The approach demonstrates practical utility by dramatically narrowing the search space for security tests and enabling downstream AI-driven security test generation, patch validation, and vulnerability-traceability tasks. The authors release Test4Vul and provide a replication package to support future research and tooling in software security testing.

Abstract

Software vulnerabilities are often detected via taint analysis, penetration testing, or fuzzing. They are also found via unit tests that exercise security-sensitive behavior with specific inputs, called vulnerability-witnessing tests. Generative AI models could help developers in writing them, but they require many examples to learn from, which are currently scarce. This paper introduces VuTeCo, an AI-driven framework for collecting examples of vulnerability-witnessing tests from Java repositories. VuTeCo carries out two tasks: (1) The "Finding" task to determine whether a unit test case is security-related, and (2) the "Matching" task to relate a test case to the vulnerability it witnesses. VuTeCo addresses the Finding task with UniXcoder, achieving an F0.5 score of 0.73 and a precision of 0.83 on a test set of unit tests from Vul4J. The Matching task is addressed using DeepSeek Coder, achieving an F0.5 score of 0.65 and a precision of 0.75 on a test set of pairs of unit tests and vulnerabilities from Vul4J. VuTeCo has been used in the wild on 427 Java projects and 1,238 vulnerabilities, obtaining 224 test cases confirmed to be security-related and 35 tests correctly matched to 29 vulnerabilities. The validated tests were collected in a new dataset called Test4Vul. VuTeCo lays the foundation for large-scale retrieval of vulnerability-witnessing tests, enabling future AI models to better understand and generate security unit tests.

A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests

TL;DR

Abstract

A Match Made in Heaven? AI-driven Matching of Vulnerabilities and Security Unit Tests

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)