ATLASv2: ATLAS Attack Engagements, Version 2
Andy Riddle, Kim Westfall, Adam Bates
TL;DR
ATLASv2 addresses the need for realistic, noisy attack data by extending the prior ATLAS dataset with higher-quality benign activity and additional telemetry from Sysmon and VMware Carbon Black Cloud. The authors generate the dataset through hands-on engagement on two Windows 7 VMs over four benign days and a fifth attack day, capturing extensive logs from ETW, DNS, Firefox, and Carbon Black sensors. They provide 154 GB of multi-source logs and ground-truth labels corresponding to ten attack scenarios (s1-s4 and m1-m6), enabling evaluation of anomaly detection and logging analysis. However, the dataset's attack steps are highly similar across scenarios, limiting its suitability for multi-class supervised learning, though its realism and availability support reproducibility and practical detection research.
Abstract
ATLASv2 is based on a previously generated dataset included in "ATLAS: A Sequence-based Learning Approach for Attack Investigation." The original ATLAS dataset is comprised of Windows Security Auditing system logs, Firefox logs, and DNS logs via WireShark. In ATLASv2, we aim to enrich the ATLAS dataset with higher quality background noise and additional logging vantage points. This work replicates the ten attack scenarios described in ATLAS, but extends the logging to include Sysmon logs and events tracked through VMware Carbon Black Cloud. The main contribution of ATLASv2 is to improve the quality of the benign system activity and the integration of the attack scenarios. Instead of relying on automated scripts to generate activity, we had two researchers use the victim machines as their primary work stations throughout the course of the engagement. This allowed us to capture system logs on actual user behavior. Additionally, the researchers conducted the attacks in a lab setup allowing the integration of the attack into the work flow of the victim user. This allows the ATLASv2 dataset to provide realistic system logs that mirror the system log activity generated in real-world attacks.
