Table of Contents
Fetching ...

Designing a reliable lateral movement detector using a graph foundation model

Corentin Larroche

TL;DR

This paper investigates the use of graph foundation models (GFMs) for cybersecurity, focusing on lateral movement detection. It introduces UltraLMD++, a GFM-based detector that achieves competitive, and often superior, performance to state-of-the-art GNN-based methods without domain-specific retraining, by combining careful input graph construction with output post-processing. The approach uses long-term and short-term context graphs to score edges and applies graph-aware refinement to emphasize clustered anomalies, and it is validated on two public datasets (OpTC and LANL) with favorable results and real-time feasibility. The findings highlight the practical potential of GFMs in cybersecurity, suggesting that practitioners can build effective detectors by focusing on data representation and post-processing rather than model training, while also outlining future work on fine-tuning, retrieval improvements, and adversarial considerations.

Abstract

Foundation models have recently emerged as a new paradigm in machine learning (ML). These models are pre-trained on large and diverse datasets and can subsequently be applied to various downstream tasks with little or no retraining. This allows people without advanced ML expertise to build ML applications, accelerating innovation across many fields. However, the adoption of foundation models in cybersecurity is hindered by their inability to efficiently process data such as network traffic captures or binary executables. The recent introduction of graph foundation models (GFMs) could make a significant difference, as graphs are well-suited to representing these types of data. We study the usability of GFMs in cybersecurity through the lens of one specific use case, namely lateral movement detection. Using a pre-trained GFM, we build a detector that reaches state-of-the-art performance without requiring any training on domain-specific data. This case study thus provides compelling evidence of the potential of GFMs for cybersecurity.

Designing a reliable lateral movement detector using a graph foundation model

TL;DR

This paper investigates the use of graph foundation models (GFMs) for cybersecurity, focusing on lateral movement detection. It introduces UltraLMD++, a GFM-based detector that achieves competitive, and often superior, performance to state-of-the-art GNN-based methods without domain-specific retraining, by combining careful input graph construction with output post-processing. The approach uses long-term and short-term context graphs to score edges and applies graph-aware refinement to emphasize clustered anomalies, and it is validated on two public datasets (OpTC and LANL) with favorable results and real-time feasibility. The findings highlight the practical potential of GFMs in cybersecurity, suggesting that practitioners can build effective detectors by focusing on data representation and post-processing rather than model training, while also outlining future work on fine-tuning, retrieval improvements, and adversarial considerations.

Abstract

Foundation models have recently emerged as a new paradigm in machine learning (ML). These models are pre-trained on large and diverse datasets and can subsequently be applied to various downstream tasks with little or no retraining. This allows people without advanced ML expertise to build ML applications, accelerating innovation across many fields. However, the adoption of foundation models in cybersecurity is hindered by their inability to efficiently process data such as network traffic captures or binary executables. The recent introduction of graph foundation models (GFMs) could make a significant difference, as graphs are well-suited to representing these types of data. We study the usability of GFMs in cybersecurity through the lens of one specific use case, namely lateral movement detection. Using a pre-trained GFM, we build a detector that reaches state-of-the-art performance without requiring any training on domain-specific data. This case study thus provides compelling evidence of the potential of GFMs for cybersecurity.

Paper Structure

This paper contains 29 sections, 5 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: High-level workflow of UltraLMD++.
  • Figure 2: Knowledge graph representation of the two datasets for UltraLMD++. AP and LT stand for authentication package and logon type, respectively. Nodes and edges representing node types are not displayed for the sake of readability.
  • Figure 3: Evolution of the $90^{\mathrm{th}}$, $99^{\mathrm{th}}$, and $99.9^{\mathrm{th}}$ percentiles of the distribution of anomaly scores over time on OpTC, for Argus and UltraLMD++.
  • Figure 4: Run time per window (mean and standard deviation) of UltraLMD++ on OpTC.
  • Figure 5: Evolution of the $90^{\mathrm{th}}$, $99^{\mathrm{th}}$, and $99.9^{\mathrm{th}}$ percentiles of the distribution of anomaly scores over time on LANL, for Argus and UltraLMD++.
  • ...and 1 more figures