Table of Contents
Fetching ...

Hack Me If You Can: Aggregating AutoEncoders for Countering Persistent Access Threats Within Highly Imbalanced Data

Sidahmed Benabderrahmane, Ngoc Hoang, Petko Valtchev, James Cheney, Talal Rahwan

TL;DR

AE-APT is presented, a deep learning-based tool for APT detection that features a family of AutoEncoder methods ranging from a basic one to a Transformer-based one, indicating superior performance in detecting and ranking anomalies.

Abstract

Advanced Persistent Threats (APTs) are sophisticated, targeted cyberattacks designed to gain unauthorized access to systems and remain undetected for extended periods. To evade detection, APT cyberattacks deceive defense layers with breaches and exploits, thereby complicating exposure by traditional anomaly detection-based security methods. The challenge of detecting APTs with machine learning is compounded by the rarity of relevant datasets and the significant imbalance in the data, which makes the detection process highly burdensome. We present AE-APT, a deep learning-based tool for APT detection that features a family of AutoEncoder methods ranging from a basic one to a Transformer-based one. We evaluated our tool on a suite of provenance trace databases produced by the DARPA Transparent Computing program, where APT-like attacks constitute as little as 0.004% of the data. The datasets span multiple operating systems, including Android, Linux, BSD, and Windows, and cover two attack scenarios. The outcomes showed that AE-APT has significantly higher detection rates compared to its competitors, indicating superior performance in detecting and ranking anomalies.

Hack Me If You Can: Aggregating AutoEncoders for Countering Persistent Access Threats Within Highly Imbalanced Data

TL;DR

AE-APT is presented, a deep learning-based tool for APT detection that features a family of AutoEncoder methods ranging from a basic one to a Transformer-based one, indicating superior performance in detecting and ranking anomalies.

Abstract

Advanced Persistent Threats (APTs) are sophisticated, targeted cyberattacks designed to gain unauthorized access to systems and remain undetected for extended periods. To evade detection, APT cyberattacks deceive defense layers with breaches and exploits, thereby complicating exposure by traditional anomaly detection-based security methods. The challenge of detecting APTs with machine learning is compounded by the rarity of relevant datasets and the significant imbalance in the data, which makes the detection process highly burdensome. We present AE-APT, a deep learning-based tool for APT detection that features a family of AutoEncoder methods ranging from a basic one to a Transformer-based one. We evaluated our tool on a suite of provenance trace databases produced by the DARPA Transparent Computing program, where APT-like attacks constitute as little as 0.004% of the data. The datasets span multiple operating systems, including Android, Linux, BSD, and Windows, and cover two attack scenarios. The outcomes showed that AE-APT has significantly higher detection rates compared to its competitors, indicating superior performance in detecting and ranking anomalies.
Paper Structure (38 sections, 8 equations, 12 figures, 4 tables)

This paper contains 38 sections, 8 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: The three main steps of the APT life-cycle: From network infiltration and the expansion of the attacker’s presence to the extraction of the data.
  • Figure 2: Global architecture of the proposed pipeline AE-APT. Six neural models are trained in parallel, and a winner is selected yielding to the best ranking.
  • Figure 3: General architecture of the baseline AutoEncoder model (AE).
  • Figure 5: Organization of the DARPA's TC datasets. Each OS undergoes two attack scenarios, each of which contains five datasets. With four OS (BSD, Windows, Linux, Android), two attack scenarios, and five aspects (PE, PX, PP, PN, PA), a total of forty individual datasets are composed.
  • Figure 6: Visualization of 6 normal data points, sampled from the ProcessAll dataset of the Linux (Trace) system, Pandex scenario.
  • ...and 7 more figures