A Visualized Malware Detection Framework with CNN and Conditional GAN

Fang Wang; Hussam Al Hamadi; Ernesto Damiani

A Visualized Malware Detection Framework with CNN and Conditional GAN

Fang Wang, Hussam Al Hamadi, Ernesto Damiani

TL;DR

An integrated framework for addressing common problems experienced by ML utilizers in developing malware detection systems is proposed, designed to preserve the identities of benign/malign samples by encoding each variable into binary digits and mapping them into black and white pixels.

Abstract

Malware visualization analysis incorporating with Machine Learning (ML) has been proven to be a promising solution for improving security defenses on different platforms. In this work, we propose an integrated framework for addressing common problems experienced by ML utilizers in developing malware detection systems. Namely, a pictorial presentation system with extensions is designed to preserve the identities of benign/malign samples by encoding each variable into binary digits and mapping them into black and white pixels. A conditional Generative Adversarial Network based model is adopted to produce synthetic images and mitigate issues of imbalance classes. Detection models architected by Convolutional Neural Networks are for validating performances while training on datasets with and without artifactual samples. Result demonstrates accuracy rates of 98.51% and 97.26% for these two training scenarios.

A Visualized Malware Detection Framework with CNN and Conditional GAN

TL;DR

Abstract

Paper Structure (16 sections, 5 equations, 7 figures, 1 table)

This paper contains 16 sections, 5 equations, 7 figures, 1 table.

Introduction
Proposed Framework
Tabular Data Preparation and Augmentation
Preparation
Augmentation - SMOTE
Pictorial Representation System (PRS)
PRS Denotation
Mapping Algorithm
Padding
Convolutional Neural Network
Conditional Generative Adversarial Network
Experiments
Pictorial Transformation
Generating Artificial Malware Images
Comparing Malware Detection Performances
...and 1 more sections

Figures (7)

Figure 1: A Demonstration of PRS as $x_{1} = 19$ and $x_{2} = 22$
Figure 2: CNN Model Architecture
Figure 3: cGAN Model Demonstration
Figure 4: Benign and Malign Samples Generated through PRS
Figure 5: The Learning Curve of cGAN Model. The top panel shows the discriminator loss for real images (blue), discriminator loss for generated fake images (orange), and the generator loss for generated fake images (green). The bottom panel shows the discriminator accuracy on real (blue) and fake (orange) images during training. On both panels the convergence is achieved at around 1500 iterations.
...and 2 more figures

A Visualized Malware Detection Framework with CNN and Conditional GAN

TL;DR

Abstract

A Visualized Malware Detection Framework with CNN and Conditional GAN

Authors

TL;DR

Abstract

Table of Contents

Figures (7)