Table of Contents
Fetching ...

SmartGuard: Leveraging Large Language Models for Network Attack Detection through Audit Log Analysis and Summarization

Hao Zhang, Shuo Shao, Song Li, Zhenyu Zhong, Yan Liu, Zhan Qin

TL;DR

SmartGuard is proposed, an automated method that combines abstracted behaviors from audit event semantics with large language models that possesses excellent fine-tuning capabilities, allowing experts to assist in timely system updates.

Abstract

End-point monitoring solutions are widely deployed in today's enterprise environments to support advanced attack detection and investigation. These monitors continuously record system-level activities as audit logs and provide deep visibility into security events. Unfortunately, existing methods of semantic analysis based on audit logs have low granularity, only reaching the system call level, making it difficult to effectively classify highly covert behaviors. Additionally, existing works mainly match audit log streams with rule knowledge bases describing behaviors, which heavily rely on expertise and lack the ability to detect unknown attacks and provide interpretive descriptions. In this paper, we propose SmartGuard, an automated method that combines abstracted behaviors from audit event semantics with large language models. SmartGuard extracts specific behaviors (function level) from incoming system logs and constructs a knowledge graph, divides events by threads, and combines event summaries with graph embeddings to achieve information diagnosis and provide explanatory narratives through large language models. Our evaluation shows that SmartGuard achieves an average F1 score of 96\% in assessing malicious behaviors and demonstrates good scalability across multiple models and unknown attacks. It also possesses excellent fine-tuning capabilities, allowing experts to assist in timely system updates.

SmartGuard: Leveraging Large Language Models for Network Attack Detection through Audit Log Analysis and Summarization

TL;DR

SmartGuard is proposed, an automated method that combines abstracted behaviors from audit event semantics with large language models that possesses excellent fine-tuning capabilities, allowing experts to assist in timely system updates.

Abstract

End-point monitoring solutions are widely deployed in today's enterprise environments to support advanced attack detection and investigation. These monitors continuously record system-level activities as audit logs and provide deep visibility into security events. Unfortunately, existing methods of semantic analysis based on audit logs have low granularity, only reaching the system call level, making it difficult to effectively classify highly covert behaviors. Additionally, existing works mainly match audit log streams with rule knowledge bases describing behaviors, which heavily rely on expertise and lack the ability to detect unknown attacks and provide interpretive descriptions. In this paper, we propose SmartGuard, an automated method that combines abstracted behaviors from audit event semantics with large language models. SmartGuard extracts specific behaviors (function level) from incoming system logs and constructs a knowledge graph, divides events by threads, and combines event summaries with graph embeddings to achieve information diagnosis and provide explanatory narratives through large language models. Our evaluation shows that SmartGuard achieves an average F1 score of 96\% in assessing malicious behaviors and demonstrates good scalability across multiple models and unknown attacks. It also possesses excellent fine-tuning capabilities, allowing experts to assist in timely system updates.

Paper Structure

This paper contains 28 sections, 3 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Scenario example. The nodes in the figure are system entities (rectangles represent functions, rounded rectangles represent addresses and files, ellipses represent sockets, and diamonds represent databases). The edges between the nodes represent system calls. For clarity, we color-code the source data objects in the behavior, with red and yellow representing high-risk behaviors.
  • Figure 2: Attack subgraph for stealing information from different databases. We color-coded the data objects, with yellow indicating similar behavior semantics.
  • Figure 3: SmartGuard Overview. First, we extract specific behaviors (function-level) from the logs and construct a knowledge graph. Second, we divide the behaviors according to threads and extract text summaries. Then, we perform embeddings on the extracted behavior subgraphs and combine them with text to form behavior semantics. Finally, we use a large language model to diagnose the behavior semantics and provide explanatory narratives.
  • Figure 4: The prompt to predict incident category.
  • Figure 6: Interpretive narrative of the behavior summary by the large language model.
  • ...and 5 more figures