Table of Contents
Fetching ...

CICAPT-IIOT: A provenance-based APT attack dataset for IIoT environment

Erfan Ghiasvand, Suprio Ray, Shahrear Iqbal, Sajjad Dadkhah, Ali A. Ghorbani

TL;DR

This work addresses the scarcity of IIoT-focused datasets for detecting advanced persistent threats by introducing CICAPT-IIoT, a semi-synthetic dataset generated on a hybrid IIoT testbed that captures both network traffic and provenance data. The dataset encompasses over 20 attack techniques mapped to eight MITRE ATT&CK-based tactics, including data collection, exfiltration, discovery, persistence, defense evasion, and lateral movement, to reflect realistic APT campaigns. The authors describe the integrated testbed, attack emulation plan adapted to a Linux/Caldera-like environment, and phase-based data collection that yields a ~10 GB corpus with rich provenance graphs and PCAP-derived features. They also compare CICAPT-IIoT with existing datasets, arguing that multi-source fusion and comprehensive IIoT-context coverage enhance realism and the potential for robust provenance-enabled APT detection research in industrial settings.

Abstract

The Industrial Internet of Things (IIoT) is a transformative paradigm that integrates smart sensors, advanced analytics, and robust connectivity within industrial processes, enabling real-time data-driven decision-making and enhancing operational efficiency across diverse sectors, including manufacturing, energy, and logistics. IIoT is susceptible to various attack vectors, with Advanced Persistent Threats (APTs) posing a particularly grave concern due to their stealthy, prolonged, and targeted nature. The effectiveness of machine learning-based intrusion detection systems in APT detection has been documented in the literature. However, existing cybersecurity datasets often lack crucial attributes for APT detection in IIoT environments. Incorporating insights from prior research on APT detection using provenance data and intrusion detection within IoT systems, we present the CICAPT-IIoT dataset. The main goal of this paper is to propose a novel APT dataset in the IIoT setting that includes essential information for the APT detection task. In order to achieve this, a testbed for IIoT is developed, and over 20 attack techniques frequently used in APT campaigns are included. The performed attacks create some of the invariant phases of the APT cycle, including Data Collection and Exfiltration, Discovery and Lateral Movement, Defense Evasion, and Persistence. By integrating network logs and provenance logs with detailed attack information, the CICAPT-IIoT dataset presents foundation for developing holistic cybersecurity measures. Additionally, a comprehensive dataset analysis is provided, presenting cybersecurity experts with a strong basis on which to build innovative and efficient security solutions.

CICAPT-IIOT: A provenance-based APT attack dataset for IIoT environment

TL;DR

This work addresses the scarcity of IIoT-focused datasets for detecting advanced persistent threats by introducing CICAPT-IIoT, a semi-synthetic dataset generated on a hybrid IIoT testbed that captures both network traffic and provenance data. The dataset encompasses over 20 attack techniques mapped to eight MITRE ATT&CK-based tactics, including data collection, exfiltration, discovery, persistence, defense evasion, and lateral movement, to reflect realistic APT campaigns. The authors describe the integrated testbed, attack emulation plan adapted to a Linux/Caldera-like environment, and phase-based data collection that yields a ~10 GB corpus with rich provenance graphs and PCAP-derived features. They also compare CICAPT-IIoT with existing datasets, arguing that multi-source fusion and comprehensive IIoT-context coverage enhance realism and the potential for robust provenance-enabled APT detection research in industrial settings.

Abstract

The Industrial Internet of Things (IIoT) is a transformative paradigm that integrates smart sensors, advanced analytics, and robust connectivity within industrial processes, enabling real-time data-driven decision-making and enhancing operational efficiency across diverse sectors, including manufacturing, energy, and logistics. IIoT is susceptible to various attack vectors, with Advanced Persistent Threats (APTs) posing a particularly grave concern due to their stealthy, prolonged, and targeted nature. The effectiveness of machine learning-based intrusion detection systems in APT detection has been documented in the literature. However, existing cybersecurity datasets often lack crucial attributes for APT detection in IIoT environments. Incorporating insights from prior research on APT detection using provenance data and intrusion detection within IoT systems, we present the CICAPT-IIoT dataset. The main goal of this paper is to propose a novel APT dataset in the IIoT setting that includes essential information for the APT detection task. In order to achieve this, a testbed for IIoT is developed, and over 20 attack techniques frequently used in APT campaigns are included. The performed attacks create some of the invariant phases of the APT cycle, including Data Collection and Exfiltration, Discovery and Lateral Movement, Defense Evasion, and Persistence. By integrating network logs and provenance logs with detailed attack information, the CICAPT-IIoT dataset presents foundation for developing holistic cybersecurity measures. Additionally, a comprehensive dataset analysis is provided, presenting cybersecurity experts with a strong basis on which to build innovative and efficient security solutions.
Paper Structure (13 sections, 2 figures, 4 tables)