A robust statistical framework for cyber-vulnerability prioritisation under partial information in threat intelligence

Mario Angelelli; Serena Arima; Christian Catalano; Enrico Ciavolino

A robust statistical framework for cyber-vulnerability prioritisation under partial information in threat intelligence

Mario Angelelli, Serena Arima, Christian Catalano, Enrico Ciavolino

TL;DR

The paper develops a robust framework for prioritising cyber-vulnerabilities under partial information by employing mid-quantile regression (MidQR) to model ordinal vulnerability risk and by introducing AGR, an invariant, rank-based accuracy measure for uncertain environments. It merges qualitative CVSS-informed features with quantitative exposure and exploit indicators, delivering probabilistic risk estimates for ordinal responses and a nonparametric conditional distribution via G_{Y|X}. Through extensive simulations and a real Italian CVE dataset, the authors show MidQR plus AGR often outperforms traditional ordered logit and rank-transform approaches, particularly in settings with unknown vulnerabilities and partial knowledge. The framework enhances threat intelligence by enabling flexible, interpretable, and robust prioritisation, with practical implications for information disclosure, resource allocation, and adaptive defence strategies.

Abstract

Proactive cyber-risk assessment is gaining momentum due to the wide range of sectors that can benefit from the prevention of cyber-incidents by preserving integrity, confidentiality, and the availability of data. The rising attention to cybersecurity also results from the increasing connectivity of cyber-physical systems, which generates multiple sources of uncertainty about emerging cyber-vulnerabilities. This work introduces a robust statistical framework for quantitative and qualitative reasoning under uncertainty about cyber-vulnerabilities and their prioritisation. Specifically, we take advantage of mid-quantile regression to deal with ordinal risk assessments, and we compare it to current alternatives for cyber-risk ranking and graded responses. For this purpose, we identify a novel accuracy measure suited for rank invariance under partial knowledge of the whole set of existing vulnerabilities. The model is tested on both simulated and real data from selected databases that support the evaluation, exploitation, or response to cyber-vulnerabilities in realistic contexts. Such datasets allow us to compare multiple models and accuracy measures, discussing the implications of partial knowledge about cyber-vulnerabilities on threat intelligence and decision-making in operational scenarios.

A robust statistical framework for cyber-vulnerability prioritisation under partial information in threat intelligence

TL;DR

Abstract

Paper Structure (28 sections, 17 equations, 9 figures, 10 tables)

This paper contains 28 sections, 17 equations, 9 figures, 10 tables.

Introduction
Related work
Cyber-risk assessment and modelling
Preliminaries on statistical models
Ordered logit model
Rank transform in linear regression
Quantile regression: remarks for cyber-risk assessment
Mid-quantile regression
Contribution and proposed methodology
Estimation: MidQR for robust cyber-vulnerability assessment
A new performance index for cyber-risk prediction under uncertainty
Data sources
Databases
Data description
Experiments and results
...and 13 more sections

Figures (9)

Figure 1: Graphical description of the experiments to validate the efficiency of mid-quantile regression for priority estimates and AGR as an accuracy index of predicted risk levels.
Figure 2: Distribution of levels of variables from the cyber-vulnerability dataset.
Figure 3: QQ-plots of the theoretical (normal) quantiles compared to the empirical quantiles of residuals of $y=10\cdot \log_{10}(1+N_{\mathrm{exp}})$ derived from the exposure $N_{\mathrm{exp}}$ of cyber-vulnerabilities.
Figure 4: Histograms for the empirical distributions of exposure $N_{\mathrm{exp}}$ compared to $10\cdot \log_{10}(1+N_{\mathrm{exp}})$. The corresponding continuous approximations (red dashed lines) highlight multimodality.
Figure 5: Boxplots for RGA and AGR when $k=4$; both uniform and non-uniform probability distributions are considered starting from the data-generating OrdLog model. Boxplots refer, from left to right of the x-axis, to OrdLog, LinReg, MidQR with $\tau$ taking values in $\{0.1, 0.3, 0.5, 0.7, 0.9\}$, and the reference value $\mathrm{RGA}(r_{\mathrm{true}},r_{\mathrm{true}})$.
...and 4 more figures

Theorems & Definitions (1)

Example 1

A robust statistical framework for cyber-vulnerability prioritisation under partial information in threat intelligence

TL;DR

Abstract

A robust statistical framework for cyber-vulnerability prioritisation under partial information in threat intelligence

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (1)