Table of Contents
Fetching ...

Predicting known Vulnerabilities from Attack News: A Transformer-Based Approach

Refat Othman, Diaeddin Rimawi, Bruno Rossi, Barbara Russo

TL;DR

A semantic similarity-based approach utilizing the multi-qa-mpnet-base-dot-v1 (MPNet) sentence transformer model is proposed to generate a ranked list of the most likely CVEs corresponding to each news report, demonstrating the model's potential to facilitate automated vulnerability identification from real-world cyberattack news reports.

Abstract

Identifying the vulnerabilities exploited during cyberattacks is essential for enabling timely responses and effective mitigation in software security. This paper directly examines the process of predicting software vulnerabilities, specifically Common Vulnerabilities and Exposures (CVEs), from unstructured descriptions of attacks reported in cybersecurity news articles. We propose a semantic similarity-based approach utilizing the multi-qa-mpnet-base-dot-v1 (MPNet) sentence transformer model to generate a ranked list of the most likely CVEs corresponding to each news report. To assess the accuracy of the predicted vulnerabilities, we implement four complementary validation methods: filtering predictions based on similarity thresholds, conducting manual validation, performing semantic comparisons with the first vulnerability explicitly mentioned in each report, and comparing against all CVEs referenced within the report. Experimental results, drawn from a dataset of 100 SecurityWeek news articles, demonstrate that the model attains a precision of 81 percent when employing threshold-based filtering. Manual evaluations report that 70 percent of the predictions are relevant, while comparisons with the initially mentioned CVEs reveal agreement rates of 80 percent with the first listed vulnerability and 78 percent across all referenced CVEs. In 57 percent of the news reports analyzed, at least one predicted vulnerability precisely matched a CVE-ID mentioned in the article. These findings underscore the model's potential to facilitate automated vulnerability identification from real-world cyberattack news reports.

Predicting known Vulnerabilities from Attack News: A Transformer-Based Approach

TL;DR

A semantic similarity-based approach utilizing the multi-qa-mpnet-base-dot-v1 (MPNet) sentence transformer model is proposed to generate a ranked list of the most likely CVEs corresponding to each news report, demonstrating the model's potential to facilitate automated vulnerability identification from real-world cyberattack news reports.

Abstract

Identifying the vulnerabilities exploited during cyberattacks is essential for enabling timely responses and effective mitigation in software security. This paper directly examines the process of predicting software vulnerabilities, specifically Common Vulnerabilities and Exposures (CVEs), from unstructured descriptions of attacks reported in cybersecurity news articles. We propose a semantic similarity-based approach utilizing the multi-qa-mpnet-base-dot-v1 (MPNet) sentence transformer model to generate a ranked list of the most likely CVEs corresponding to each news report. To assess the accuracy of the predicted vulnerabilities, we implement four complementary validation methods: filtering predictions based on similarity thresholds, conducting manual validation, performing semantic comparisons with the first vulnerability explicitly mentioned in each report, and comparing against all CVEs referenced within the report. Experimental results, drawn from a dataset of 100 SecurityWeek news articles, demonstrate that the model attains a precision of 81 percent when employing threshold-based filtering. Manual evaluations report that 70 percent of the predictions are relevant, while comparisons with the initially mentioned CVEs reveal agreement rates of 80 percent with the first listed vulnerability and 78 percent across all referenced CVEs. In 57 percent of the news reports analyzed, at least one predicted vulnerability precisely matched a CVE-ID mentioned in the article. These findings underscore the model's potential to facilitate automated vulnerability identification from real-world cyberattack news reports.
Paper Structure (21 sections, 12 equations, 3 figures, 4 tables)

This paper contains 21 sections, 12 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Overview of methodology.
  • Figure 2: Precision-Recall curve for varying similarity threshold $\rho$.
  • Figure 3: Performance measures of our approach over different $k$ Values in top-$k$.