Table of Contents
Fetching ...

Learning the Language of NVMe Streams for Ransomware Detection

Barak Bringoltz, Elisha Halperin, Ran Feraru, Evgeny Blaichman, Amit Berman

TL;DR

The paper tackles ransomware detection in NVMe IO streams by modeling sequential command streams with two transformer-based architectures: a Command Level Transformer (CLT) for per-command classification and a Patch Level Transformer (PLT) for estimating ransomware IO volumes in patches. It introduces a novel tokenization and embedding scheme tailored to NVMe attributes, enabling token-level learning and context-aware predictions. On a large labeled dataset spanning hundreds of ransomware variants and benign workloads, CLT and PLT outperform state-of-the-art tabular methods across multiple metrics, including reductions in missed detections and data loss, and improved identification of data accessed by ransomware. The authors also demonstrate robustness to unseen ransomware, quantify per-token prediction accuracy, analyze the importance of context, examine feature ablations, and discuss hardware-vs-software deployment, arguing for SSD-controller implementations that preserve throughput and data recoverability.

Abstract

We apply language modeling techniques to detect ransomware activity in NVMe command sequences. We design and train two types of transformer-based models: the Command-Level Transformer (CLT) performs in-context token classification to determine whether individual commands are initiated by ransomware, and the Patch-Level Transformer (PLT) predicts the volume of data accessed by ransomware within a patch of commands. We present both model designs and the corresponding tokenization and embedding schemes and show that they improve over state-of-the-art tabular methods by up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware.

Learning the Language of NVMe Streams for Ransomware Detection

TL;DR

The paper tackles ransomware detection in NVMe IO streams by modeling sequential command streams with two transformer-based architectures: a Command Level Transformer (CLT) for per-command classification and a Patch Level Transformer (PLT) for estimating ransomware IO volumes in patches. It introduces a novel tokenization and embedding scheme tailored to NVMe attributes, enabling token-level learning and context-aware predictions. On a large labeled dataset spanning hundreds of ransomware variants and benign workloads, CLT and PLT outperform state-of-the-art tabular methods across multiple metrics, including reductions in missed detections and data loss, and improved identification of data accessed by ransomware. The authors also demonstrate robustness to unseen ransomware, quantify per-token prediction accuracy, analyze the importance of context, examine feature ablations, and discuss hardware-vs-software deployment, arguing for SSD-controller implementations that preserve throughput and data recoverability.

Abstract

We apply language modeling techniques to detect ransomware activity in NVMe command sequences. We design and train two types of transformer-based models: the Command-Level Transformer (CLT) performs in-context token classification to determine whether individual commands are initiated by ransomware, and the Patch-Level Transformer (PLT) predicts the volume of data accessed by ransomware within a patch of commands. We present both model designs and the corresponding tokenization and embedding schemes and show that they improve over state-of-the-art tabular methods by up to 24% in missed-detection rate, 66% in data loss prevention, and 84% in identifying data accessed by ransomware.

Paper Structure

This paper contains 41 sections, 7 equations, 8 figures, 18 tables.

Figures (8)

  • Figure 1: A high-level diagram of our ransomware detection pipelines for both the (a) CLT (left panel) and (b) PLT (right panel) models. Both first sample and process a set of commands from the NVMe stream. Then, each tokenizes the commands, feeds the tokens into its model, and outputs a prediction per token. These token predictions are pooled to produce a final binary prediction.
  • Figure 2: The cumulative distribution function of the $MBD$ for the four best models that appear in \ref{['p_miss_and_MBD']}, with their $1\sigma$ spread.
  • Figure 3: The in-distribution (id) and out-of-distribution (ood) results for $MBD_3$ and $P_{miss}$. The error bars take into account both results variability between folds and individual fold uncertainty.
  • Figure 4: The histogram heatmap for the correlation between the actual fraction of ransomware IO volume and its PLT prediction.
  • Figure 5: The accuracy of data slices for the CLT versus the slice ransomware command fraction.
  • ...and 3 more figures