Table of Contents
Fetching ...

SAGA: Synthetic Audit Log Generation for APT Campaigns

Yi-Ting Huang, Ying-Ren Guo, Yu-Sheng Yang, Guo-Wei Wong, Yu-Zih Jheng, Yeali Sun, Jessemyn Modini, Timothy Lynar, Meng Chang Chen

TL;DR

SAGA introduces a fully automated framework to generate configurable, finely labeled synthetic audit logs that emulate real system logs and embed APT campaigns aligned to MITRE ATT&CK. By extracting attack patterns from red-team emulations, abstracting them into templates, and instantiating diverse artifacts, SAGA produces logs suitable for training deep learning and benchmarking multiple detection methods. The authors demonstrate usefulness through intrusion detection, technique hunting, and campaign attribution experiments, showing that models trained on synthetic logs can generalize to unseen techniques and campaigns. This synthetic-data approach provides scalable, scenario-driven benchmarks to advance ML-based defense while acknowledging limitations related to realism and distributional shifts. The work highlights practical impact for evaluating and developing APT detection methods in a controlled, reproducible setting and outlines avenues for further enhancement and cross-platform applicability.

Abstract

With the increasing sophistication of Advanced Persistent Threats (APTs), the demand for effective detection and mitigation strategies and methods has escalated. Program execution leaves traces in the system audit log, which can be analyzed to detect malicious activities. However, collecting and analyzing large volumes of audit logs over extended periods is challenging, further compounded by insufficient labeling that hinders their usability. Addressing these challenges, this paper introduces SAGA (Synthetic Audit log Generation for APT campaigns), a novel approach for generating find-grained labeled synthetic audit logs that mimic real-world system logs while embedding stealthy APT attacks. SAGA generates configurable audit logs for arbitrary duration, blending benign logs from normal operations with malicious logs based on the definitions the MITRE ATT\&CK framework. Malicious audit logs follow an APT lifecycle, incorporating various attack techniques at each stage. These synthetic logs can serve as benchmark datasets for training machine learning models and assessing diverse APT detection methods. To demonstrate the usefulness of synthetic audit logs, we ran established baselines of event-based technique hunting and APT campaign detection using various synthetic audit logs. In addition, we show that a deep learning model trained on synthetic audit logs can detect previously unseen techniques within audit logs.

SAGA: Synthetic Audit Log Generation for APT Campaigns

TL;DR

SAGA introduces a fully automated framework to generate configurable, finely labeled synthetic audit logs that emulate real system logs and embed APT campaigns aligned to MITRE ATT&CK. By extracting attack patterns from red-team emulations, abstracting them into templates, and instantiating diverse artifacts, SAGA produces logs suitable for training deep learning and benchmarking multiple detection methods. The authors demonstrate usefulness through intrusion detection, technique hunting, and campaign attribution experiments, showing that models trained on synthetic logs can generalize to unseen techniques and campaigns. This synthetic-data approach provides scalable, scenario-driven benchmarks to advance ML-based defense while acknowledging limitations related to realism and distributional shifts. The work highlights practical impact for evaluating and developing APT detection methods in a controlled, reproducible setting and outlines avenues for further enhancement and cross-platform applicability.

Abstract

With the increasing sophistication of Advanced Persistent Threats (APTs), the demand for effective detection and mitigation strategies and methods has escalated. Program execution leaves traces in the system audit log, which can be analyzed to detect malicious activities. However, collecting and analyzing large volumes of audit logs over extended periods is challenging, further compounded by insufficient labeling that hinders their usability. Addressing these challenges, this paper introduces SAGA (Synthetic Audit log Generation for APT campaigns), a novel approach for generating find-grained labeled synthetic audit logs that mimic real-world system logs while embedding stealthy APT attacks. SAGA generates configurable audit logs for arbitrary duration, blending benign logs from normal operations with malicious logs based on the definitions the MITRE ATT\&CK framework. Malicious audit logs follow an APT lifecycle, incorporating various attack techniques at each stage. These synthetic logs can serve as benchmark datasets for training machine learning models and assessing diverse APT detection methods. To demonstrate the usefulness of synthetic audit logs, we ran established baselines of event-based technique hunting and APT campaign detection using various synthetic audit logs. In addition, we show that a deep learning model trained on synthetic audit logs can detect previously unseen techniques within audit logs.

Paper Structure

This paper contains 22 sections, 8 figures, 8 tables, 2 algorithms.

Figures (8)

  • Figure 1: Example of audit log events captured by Procmon.
  • Figure 2: ⓐ Mandiant attack lifecycle mandiantexposing of APT28. ⓑ The APT28 attack provenance graph, derived from APT28, presents five malicious events labeled with six techniques, which are mapped to the corresponding stages of the attack lifecycle. In each zoomed-in box, the dotted boundary color corresponds to its specific stage in the attack lifecycle. Rectangular nodes represent files, diamond-shaped nodes represent sockets, oval nodes denote processes, and the edges illustrate the causal relationships between the entities. The red arrow edges indicate the sequence of malicious events.
  • Figure 3: SAGA workflow
  • Figure 4: Attack pattern template model
  • Figure 5: APT28 Attack pattern template example
  • ...and 3 more figures