Assessing and Prioritizing Ransomware Risk Based on Historical Victim Data
Spencer Massengale, Philip Huff
TL;DR
The paper addresses predicting ransomware targeting using historical victim data to prioritize defenses. It introduces a pipeline leveraging LLMs to extract SKRAM threat actor profiles from public disclosures, transform outputs into STIX, and augment data synthetically. A Random Forest classifier predicts a risk score for entities, validated on a dataset augmented with synthetic data and time-sensitive activity via an EWMA metric $V_t = \lambda V_{t-1} + (1-\lambda) x_t$. It discusses limitations due to data scarcity and geographic bias and outlines future directions, including broader LLM comparisons and real-world data integration, to improve actionable ransomware risk assessment for organizations.
Abstract
We present an approach to identifying which ransomware adversaries are most likely to target specific entities, thereby assisting these entities in formulating better protection strategies. Ransomware poses a formidable cybersecurity threat characterized by profit-driven motives, a complex underlying economy supporting criminal syndicates, and the overt nature of its attacks. This type of malware has consistently ranked among the most prevalent, with a rapid escalation in activity observed. Recent estimates indicate that approximately two-thirds of organizations experienced ransomware attacks in 2023 \cite{Sophos2023Ransomware}. A central tactic in ransomware campaigns is publicizing attacks to coerce victims into paying ransoms. Our study utilizes public disclosures from ransomware victims to predict the likelihood of an entity being targeted by a specific ransomware variant. We employ a Large Language Model (LLM) architecture that uses a unique chain-of-thought, multi-shot prompt methodology to define adversary SKRAM (Skills, Knowledge, Resources, Authorities, and Motivation) profiles from ransomware bulletins, threat reports, and news items. This analysis is enriched with publicly available victim data and is further enhanced by a heuristic for generating synthetic data that reflects victim profiles. Our work culminates in the development of a machine learning model that assists organizations in prioritizing ransomware threats and formulating defenses based on the tactics, techniques, and procedures (TTP) of the most likely attackers.
