Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta, Olha Jureckova, Mark Stamp
TL;DR
This paper tackles malware family classification in the face of evolving variants by proposing a NLP-inspired hybrid architecture that couples Hidden Markov Models (HMMs) with Convolutional Neural Networks (CNNs). It first trains seven family-specific HMMs on opcode sequences to generate hidden-state features, which are then assembled into 224×224 images and classified by a CNN. On the Malicia dataset, the proposed HMM-CNN approach outperforms several baselines, including HMM-RF and SVM-based methods, demonstrating the value of HMM-derived sequential features for static malware analysis. The work highlights the potential of combining sequential modeling with deep feature learning to improve malware classification, and discusses avenues for future work such as extending to obfuscated malware and optimizing training efficiency.
Abstract
The proliferation of malware variants poses a significant challenges to traditional malware detection approaches, such as signature-based methods, necessitating the development of advanced machine learning techniques. In this research, we present a novel approach based on a hybrid architecture combining features extracted using a Hidden Markov Model (HMM), with a Convolutional Neural Network (CNN) then used for malware classification. Inspired by the strong results in previous work using an HMM-Random Forest model, we propose integrating HMMs, which serve to capture sequential patterns in opcode sequences, with CNNs, which are adept at extracting hierarchical features. We demonstrate the effectiveness of our approach on the popular Malicia dataset, and we obtain superior performance, as compared to other machine learning methods -- our results surpass the aforementioned HMM-Random Forest model. Our findings underscore the potential of hybrid HMM-CNN architectures in bolstering malware classification capabilities, offering several promising avenues for further research in the field of cybersecurity.
