Machine Learning for Windows Malware Detection and Classification: Methods, Challenges and Ongoing Research

Daniel Gibert

Machine Learning for Windows Malware Detection and Classification: Methods, Challenges and Ongoing Research

Daniel Gibert

TL;DR

This chapter starts by introducing the main components of a Machine Learning pipeline, highlighting the challenges of collecting and maintaining up-to-date datasets, and introduces the primary challenges encountered by machine learning-based malware detectors, including concept drift and adversarial attacks.

Abstract

In this chapter, readers will explore how machine learning has been applied to build malware detection systems designed for the Windows operating system. This chapter starts by introducing the main components of a Machine Learning pipeline, highlighting the challenges of collecting and maintaining up-to-date datasets. Following this introduction, various state-of-the-art malware detectors are presented, encompassing both feature-based and deep learning-based detectors. Subsequent sections introduce the primary challenges encountered by machine learning-based malware detectors, including concept drift and adversarial attacks. Lastly, this chapter concludes by providing a brief overview of the ongoing research on adversarial defenses.

Machine Learning for Windows Malware Detection and Classification: Methods, Challenges and Ongoing Research

TL;DR

Abstract

Paper Structure (25 sections, 1 equation, 16 figures, 2 tables)

This paper contains 25 sections, 1 equation, 16 figures, 2 tables.

Building a Machine Learning-based Malware Detector from Scratch: Main Components
Data Collection
Datasets
Data Preprocessing
Model Training and Model Evaluation
Static Machine Learning-based Malware Detectors
Feature-based Detectors
EMBER LightGBM Model
Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification
Deep Learning-based Detectors
Byte-based Detectors
Assembly Language Instructions-based Detectors
Visualization Techniques
Grayscale Image Representation
Structural Entropy Representation
...and 10 more sections

Figures (16)

Figure 1: A graphical depiction of the PE file format.
Figure 2: MalConv architecture DBLP:conf/aaai/RaffBSBCN18 .
Figure 3: AvastConv architecture.
Figure 4: ShallowConv architecture DBLP:conf/ccia/GibertBMPSV17GIBERT2021102159.
Figure 5: Grayscale image representation of malware binaries belonging to the Kelihos_ver1, Obfuscator.ACY and Gatak families, respectively DBLP:journals/virology/GibertMPV19.
...and 11 more figures

Machine Learning for Windows Malware Detection and Classification: Methods, Challenges and Ongoing Research

TL;DR

Abstract

Machine Learning for Windows Malware Detection and Classification: Methods, Challenges and Ongoing Research

Authors

TL;DR

Abstract

Table of Contents

Figures (16)