Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

Mohammad Mahmudul Alam; Edward Raff; Stella Biderman; Tim Oates; James Holt

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

TL;DR

Holographic Global Convolutional Networks (HGConv) is introduced that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements to encode and decode features from sequence elements.

Abstract

Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements. Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. HGConv kernels are defined as simple parameters learned through backpropagation. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks. With log-linear complexity in sequence length, the empirical results demonstrate substantially faster run-time by HGConv compared to other methods achieving far more efficient scaling even with sequence length $\geq 100,000$.

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

TL;DR

Abstract

Paper Structure (20 sections, 9 equations, 9 figures, 5 tables)

This paper contains 20 sections, 9 equations, 9 figures, 5 tables.

Introduction
Malware Detection
Efficient Transformer-Based Models
Non-Transformer Models for Sequences
Our Contributions
Methodology
Holographic Reduced Representations
Holographic Global Convolutional Networks
Algorithmic Complexity
Experiments and Results
Kaggle
Drebin
EMBER
Training
Evaluations
...and 5 more sections

Figures (9)

Figure 1: The block diagram of the proposed method. The dotted region shows a single layer of the proposed network which is repeated $N$ times. In the figure, prenorm is applied. In the case of postnorm, normalization is applied after the GLU layer before the skip connection.
Figure 2: Ember long sequence malware classification results. In the figure, OOT and OOM stand for out-of-time and memory shown for models that face such issues after a particular sequence length. The figure shows a shorter comparison. A broader comparison with additional models Linformer linformer, Performer performers, and F-Net fnet and numeric results are presented in \ref{['sec:appendix_d']}.
Figure 3: Drebin Apk dataset in the benchmark has the most variation in the results across the models. The figure shows the UMAP 3D representation of the output from the penultimate layer of all the models for Drebin Apk. The better the clusters the higher the accuracy.
Figure 4: The correlation between Malware and other LRA tasks accuracies. While performance between LRA tasks is highly correlated with one another, they all correlate far worse with the malware benchmarks.
Figure 5: Kaggle Raw
...and 4 more figures

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

TL;DR

Abstract

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (9)