Table of Contents
Fetching ...

Assessing and Prioritizing Ransomware Risk Based on Historical Victim Data

Spencer Massengale, Philip Huff

TL;DR

The paper addresses predicting ransomware targeting using historical victim data to prioritize defenses. It introduces a pipeline leveraging LLMs to extract SKRAM threat actor profiles from public disclosures, transform outputs into STIX, and augment data synthetically. A Random Forest classifier predicts a risk score for entities, validated on a dataset augmented with synthetic data and time-sensitive activity via an EWMA metric $V_t = \lambda V_{t-1} + (1-\lambda) x_t$. It discusses limitations due to data scarcity and geographic bias and outlines future directions, including broader LLM comparisons and real-world data integration, to improve actionable ransomware risk assessment for organizations.

Abstract

We present an approach to identifying which ransomware adversaries are most likely to target specific entities, thereby assisting these entities in formulating better protection strategies. Ransomware poses a formidable cybersecurity threat characterized by profit-driven motives, a complex underlying economy supporting criminal syndicates, and the overt nature of its attacks. This type of malware has consistently ranked among the most prevalent, with a rapid escalation in activity observed. Recent estimates indicate that approximately two-thirds of organizations experienced ransomware attacks in 2023 \cite{Sophos2023Ransomware}. A central tactic in ransomware campaigns is publicizing attacks to coerce victims into paying ransoms. Our study utilizes public disclosures from ransomware victims to predict the likelihood of an entity being targeted by a specific ransomware variant. We employ a Large Language Model (LLM) architecture that uses a unique chain-of-thought, multi-shot prompt methodology to define adversary SKRAM (Skills, Knowledge, Resources, Authorities, and Motivation) profiles from ransomware bulletins, threat reports, and news items. This analysis is enriched with publicly available victim data and is further enhanced by a heuristic for generating synthetic data that reflects victim profiles. Our work culminates in the development of a machine learning model that assists organizations in prioritizing ransomware threats and formulating defenses based on the tactics, techniques, and procedures (TTP) of the most likely attackers.

Assessing and Prioritizing Ransomware Risk Based on Historical Victim Data

TL;DR

The paper addresses predicting ransomware targeting using historical victim data to prioritize defenses. It introduces a pipeline leveraging LLMs to extract SKRAM threat actor profiles from public disclosures, transform outputs into STIX, and augment data synthetically. A Random Forest classifier predicts a risk score for entities, validated on a dataset augmented with synthetic data and time-sensitive activity via an EWMA metric . It discusses limitations due to data scarcity and geographic bias and outlines future directions, including broader LLM comparisons and real-world data integration, to improve actionable ransomware risk assessment for organizations.

Abstract

We present an approach to identifying which ransomware adversaries are most likely to target specific entities, thereby assisting these entities in formulating better protection strategies. Ransomware poses a formidable cybersecurity threat characterized by profit-driven motives, a complex underlying economy supporting criminal syndicates, and the overt nature of its attacks. This type of malware has consistently ranked among the most prevalent, with a rapid escalation in activity observed. Recent estimates indicate that approximately two-thirds of organizations experienced ransomware attacks in 2023 \cite{Sophos2023Ransomware}. A central tactic in ransomware campaigns is publicizing attacks to coerce victims into paying ransoms. Our study utilizes public disclosures from ransomware victims to predict the likelihood of an entity being targeted by a specific ransomware variant. We employ a Large Language Model (LLM) architecture that uses a unique chain-of-thought, multi-shot prompt methodology to define adversary SKRAM (Skills, Knowledge, Resources, Authorities, and Motivation) profiles from ransomware bulletins, threat reports, and news items. This analysis is enriched with publicly available victim data and is further enhanced by a heuristic for generating synthetic data that reflects victim profiles. Our work culminates in the development of a machine learning model that assists organizations in prioritizing ransomware threats and formulating defenses based on the tactics, techniques, and procedures (TTP) of the most likely attackers.

Paper Structure

This paper contains 19 sections, 2 equations, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: Class diagram illustrating the integration of prompt design and natural language content in our Chat Completion Feature Extraction System (CCFE).
  • Figure 2: This table displays the weighting assigned to each feature used in the predictive model for ransomware victim likelihood.
  • Figure 3: This image illustrates the predictions for the Ransomware Groups Phobos and Rhysida regarding their likelihood of targeting an entity. The feature importance for the prediction with an "Extremely High" likelihood is detailed in Figure \ref{['fig:company_victim_output']}, while the prediction with a "Low" likelihood is detailed in Figure \ref{['fig:safe_victim_output']}.
  • Figure 4: The chart illustrates the feature importance of the entity detailed in Figure \ref{['fig:ransomware-prediction']} who has an 'Extremely High' likelihood of being targeted by the Ransomware Group Phobos.
  • Figure 5: The chart illustrates the feature importance of the entity detailed in Figure \ref{['fig:ransomware-prediction']} who has a 'Low' likelihood of being targeted by the Ransomware Group Rhysida.
  • ...and 1 more figures