Table of Contents
Fetching ...

MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems

Mahdi Arab Loodaricheh, Mohammad Hossein Manshaei, Anita Raja

TL;DR

MAGE-ID introduces a two-stage multimodal diffusion framework that jointly models tabular network flow features and their DeepInsight-transformed images via a unified latent prior. By training Transformer- and CNN-based VAEs and an EDM-style denoiser on concatenated latents, the method produces coherent, balanced synthetic intrusion samples across modalities. Evaluations on CIC-IDS-2017 and NSL-KDD show improved fidelity, diversity, and downstream utility over unimodal diffusion baselines, with strong cross-modal coherence and robustness. The approach offers practical benefits for augmenting IDS datasets and guiding future multimodal cyber-defense research.

Abstract

Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. While generative models have shown promise in data augmentation, existing approaches are limited to single modalities and fail to capture cross-domain dependencies. This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework that couples tabular flow features with their transformed images through a unified latent prior. By jointly training Transformer and CNN-based variational encoders with an EDM style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis. Evaluations on CIC-IDS-2017 and NSL-KDD demonstrate significant improvements in fidelity, diversity, and downstream detection performance over TabSyn and TabDDPM, highlighting the effectiveness of MAGE-ID for multimodal IDS augmentation.

MAGE-ID: A Multimodal Generative Framework for Intrusion Detection Systems

TL;DR

MAGE-ID introduces a two-stage multimodal diffusion framework that jointly models tabular network flow features and their DeepInsight-transformed images via a unified latent prior. By training Transformer- and CNN-based VAEs and an EDM-style denoiser on concatenated latents, the method produces coherent, balanced synthetic intrusion samples across modalities. Evaluations on CIC-IDS-2017 and NSL-KDD show improved fidelity, diversity, and downstream utility over unimodal diffusion baselines, with strong cross-modal coherence and robustness. The approach offers practical benefits for augmenting IDS datasets and guiding future multimodal cyber-defense research.

Abstract

Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. While generative models have shown promise in data augmentation, existing approaches are limited to single modalities and fail to capture cross-domain dependencies. This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework that couples tabular flow features with their transformed images through a unified latent prior. By jointly training Transformer and CNN-based variational encoders with an EDM style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis. Evaluations on CIC-IDS-2017 and NSL-KDD demonstrate significant improvements in fidelity, diversity, and downstream detection performance over TabSyn and TabDDPM, highlighting the effectiveness of MAGE-ID for multimodal IDS augmentation.

Paper Structure

This paper contains 12 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the proposed MAGE-ID framework. Stage 1 trains Transformer- and CNN-based VAEs on tabular and transformed images, while Stage 2 models a joint latent space with an EDM-style denoiser for coherent multimodal generation.
  • Figure 2: Radar chart comparison of TabSyn, TabDDPM, and MAGE-ID on the CIC-IDS-2017 dataset across Detectability, Precision, Recall, Density, and Coverage metrics.