Table of Contents
Fetching ...

Quantum Clustering for Cybersecurity

Walid El Maouaki, Nouhaila Innan, Alberto Marchisio, Taoufik Said, Mohamed Bennai, Muhammad Shafique

Abstract

In this study, we develop a novel quantum machine learning (QML) framework to analyze cybersecurity vulnerabilities using data from the 2022 CISA Known Exploited Vulnerabilities catalog, which includes detailed information on vulnerability types, severity levels, common vulnerability scoring system (CVSS) scores, and product specifics. Our framework preprocesses this data into a quantum-compatible format, enabling clustering analysis through our advanced quantum techniques, QCSWAPK-means and QkernelK-means. These quantum algorithms demonstrate superior performance compared to state-of-the-art classical clustering techniques like k-means and spectral clustering, achieving Silhouette scores of 0.491, Davies-Bouldin indices below 0.745, and Calinski-Harabasz scores exceeding 884, indicating more distinct and well-separated clusters. Our framework categorizes vulnerabilities into distinct groups, reflecting varying levels of risk severity: Cluster 0, primarily consisting of critical Microsoft-related vulnerabilities; Cluster 1, featuring medium severity vulnerabilities from various enterprise software vendors and network solutions; Cluster 2, with high severity vulnerabilities from Adobe, Cisco, and Google; and Cluster 3, encompassing vulnerabilities from Microsoft and Oracle with high to medium severity. These findings highlight the potential of QML to enhance the precision of vulnerability assessments and prioritization, advancing cybersecurity practices by enabling more strategic and proactive defense mechanisms.

Quantum Clustering for Cybersecurity

Abstract

In this study, we develop a novel quantum machine learning (QML) framework to analyze cybersecurity vulnerabilities using data from the 2022 CISA Known Exploited Vulnerabilities catalog, which includes detailed information on vulnerability types, severity levels, common vulnerability scoring system (CVSS) scores, and product specifics. Our framework preprocesses this data into a quantum-compatible format, enabling clustering analysis through our advanced quantum techniques, QCSWAPK-means and QkernelK-means. These quantum algorithms demonstrate superior performance compared to state-of-the-art classical clustering techniques like k-means and spectral clustering, achieving Silhouette scores of 0.491, Davies-Bouldin indices below 0.745, and Calinski-Harabasz scores exceeding 884, indicating more distinct and well-separated clusters. Our framework categorizes vulnerabilities into distinct groups, reflecting varying levels of risk severity: Cluster 0, primarily consisting of critical Microsoft-related vulnerabilities; Cluster 1, featuring medium severity vulnerabilities from various enterprise software vendors and network solutions; Cluster 2, with high severity vulnerabilities from Adobe, Cisco, and Google; and Cluster 3, encompassing vulnerabilities from Microsoft and Oracle with high to medium severity. These findings highlight the potential of QML to enhance the precision of vulnerability assessments and prioritization, advancing cybersecurity practices by enabling more strategic and proactive defense mechanisms.
Paper Structure (12 sections, 2 equations, 3 figures, 1 table)

This paper contains 12 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Clustering Process Workflow.
  • Figure 2: The elbow technique identifies the optimal number of clusters for K-means, showing the number of clusters (k) versus the WCSS. The "elbow" of the curve, indicating the optimal number of clusters, occurs at k=4.
  • Figure 3: Clustering outcomes using four algorithms: a) K-means, b) QCSWAPK-means, c) QkernelK-means, d) Spectral clustering. Subplots show clusters in 2D space (Vendor/Project vs. Product). Red X markers indicate centroids. In the classical algorithms, the dataset was normalized to fall within the range of $[0, 1]$. The quantum-enhanced methods QCSWAPK-means and QkernelK-means are normalized to $[0, \pi]$ to meet the specific requirements of quantum encoding.