Table of Contents
Fetching ...

PHANTOM: Progressive High-fidelity Adversarial Network for Threat Object Modeling

Jamal Al-Karaki, Muhammad Al-Zafar Khan, Rand Derar Mohammad Al Athamneh

TL;DR

The paper tackles the scarcity of labeled cyberattack data by proposing PHANTOM, a progressive, dual-path VAE-GAN framework with domain-specific feature matching to generate high-fidelity synthetic cyberattack samples. MAV-PFM enables stable reconstruction and high-fidelity generation across multiple resolutions, preserving temporal causality and behavioral semantics. On a 100,000-sample synthetic dataset spanning five attack types, models trained on PHANTOM data achieve near real-world performance, though rare attack types remain challenging due to severe class imbalance. The work offers a privacy-preserving data augmentation approach that can bolster intrusion detection while enabling controlled experimentation and benchmarking of synthetic data methods in cybersecurity.

Abstract

The scarcity of cyberattack data hinders the development of robust intrusion detection systems. This paper introduces PHANTOM, a novel adversarial variational framework for generating high-fidelity synthetic attack data. Its innovations include progressive training, a dual-path VAE-GAN architecture, and domain-specific feature matching to preserve the semantics of attacks. Evaluated on 100,000 network traffic samples, models trained on PHANTOM data achieve 98% weighted accuracy on real attacks. Statistical analyses confirm that the synthetic data preserves authentic distributions and diversity. Limitations in generating rare attack types are noted, highlighting challenges with severe class imbalance. This work advances the generation of synthetic data for training robust, privacy-preserving detection systems.

PHANTOM: Progressive High-fidelity Adversarial Network for Threat Object Modeling

TL;DR

The paper tackles the scarcity of labeled cyberattack data by proposing PHANTOM, a progressive, dual-path VAE-GAN framework with domain-specific feature matching to generate high-fidelity synthetic cyberattack samples. MAV-PFM enables stable reconstruction and high-fidelity generation across multiple resolutions, preserving temporal causality and behavioral semantics. On a 100,000-sample synthetic dataset spanning five attack types, models trained on PHANTOM data achieve near real-world performance, though rare attack types remain challenging due to severe class imbalance. The work offers a privacy-preserving data augmentation approach that can bolster intrusion detection while enabling controlled experimentation and benchmarking of synthetic data methods in cybersecurity.

Abstract

The scarcity of cyberattack data hinders the development of robust intrusion detection systems. This paper introduces PHANTOM, a novel adversarial variational framework for generating high-fidelity synthetic attack data. Its innovations include progressive training, a dual-path VAE-GAN architecture, and domain-specific feature matching to preserve the semantics of attacks. Evaluated on 100,000 network traffic samples, models trained on PHANTOM data achieve 98% weighted accuracy on real attacks. Statistical analyses confirm that the synthetic data preserves authentic distributions and diversity. Limitations in generating rare attack types are noted, highlighting challenges with severe class imbalance. This work advances the generation of synthetic data for training robust, privacy-preserving detection systems.

Paper Structure

This paper contains 8 sections, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Network architecture diagram of the PHANTOM (\ref{['algo:phantom']}) algorithm.
  • Figure 2: Top Left: Density profile comparison showing the density distributions of a representative network traffic feature for real and synthetic datasets. The close alignment between distributions indicates PHANTOM successfully captures the statistical properties of real cyberattack patterns. Top Right: Histogram distribution of Euclidean distances between each synthetic sample and its nearest neighbor in the synthetic dataset. The varied distance profile indicates diverse attack pattern generation, with distinct clusters of both densely and sparsely populated regions in the feature space. Bottom:$t$-SNE projection showing the latent space distribution of real cyberattack samples (blue) and PHANTOM-generated synthetic attacks (orange). The overlapping clusters demonstrate that the synthetic data preserves the natural separation between different attack classes while covering similar regions of the feature space.