A robust statistical framework for cyber-vulnerability prioritisation under partial information in threat intelligence
Mario Angelelli, Serena Arima, Christian Catalano, Enrico Ciavolino
TL;DR
The paper develops a robust framework for prioritising cyber-vulnerabilities under partial information by employing mid-quantile regression (MidQR) to model ordinal vulnerability risk and by introducing AGR, an invariant, rank-based accuracy measure for uncertain environments. It merges qualitative CVSS-informed features with quantitative exposure and exploit indicators, delivering probabilistic risk estimates for ordinal responses and a nonparametric conditional distribution via G_{Y|X}. Through extensive simulations and a real Italian CVE dataset, the authors show MidQR plus AGR often outperforms traditional ordered logit and rank-transform approaches, particularly in settings with unknown vulnerabilities and partial knowledge. The framework enhances threat intelligence by enabling flexible, interpretable, and robust prioritisation, with practical implications for information disclosure, resource allocation, and adaptive defence strategies.
Abstract
Proactive cyber-risk assessment is gaining momentum due to the wide range of sectors that can benefit from the prevention of cyber-incidents by preserving integrity, confidentiality, and the availability of data. The rising attention to cybersecurity also results from the increasing connectivity of cyber-physical systems, which generates multiple sources of uncertainty about emerging cyber-vulnerabilities. This work introduces a robust statistical framework for quantitative and qualitative reasoning under uncertainty about cyber-vulnerabilities and their prioritisation. Specifically, we take advantage of mid-quantile regression to deal with ordinal risk assessments, and we compare it to current alternatives for cyber-risk ranking and graded responses. For this purpose, we identify a novel accuracy measure suited for rank invariance under partial knowledge of the whole set of existing vulnerabilities. The model is tested on both simulated and real data from selected databases that support the evaluation, exploitation, or response to cyber-vulnerabilities in realistic contexts. Such datasets allow us to compare multiple models and accuracy measures, discussing the implications of partial knowledge about cyber-vulnerabilities on threat intelligence and decision-making in operational scenarios.
