Decoding Android Malware with a Fraction of Features: An Attention-Enhanced MLP-SVM Approach
Safayat Bin Hakim, Muhammad Adil, Kamal Acharya, Houbing Herbert Song
TL;DR
This paper tackles Android malware detection and family classification in a setting of highly obfuscated and evolving threats. It introduces a hybrid pipeline that combines an attention-enhanced MLP for robust representation learning with a Radial Basis Function SVM, using Linear Discriminant Analysis to reduce features from 512 to 14 after initial 47-feature selection from the CCCS-CIC-AndMal-2020 dataset. The approach achieves over 99% accuracy while drastically reducing feature dimensionality and providing SHAP-based explanations for model interpretability. The work demonstrates strong performance advantages over state-of-the-art methods, with implications for scalable, efficient, and explainable mobile threat detection in real-world deployments.
Abstract
The escalating sophistication of Android malware poses significant challenges to traditional detection methods, necessitating innovative approaches that can efficiently identify and classify threats with high precision. This paper introduces a novel framework that synergistically integrates an attention-enhanced Multi-Layer Perceptron (MLP) with a Support Vector Machine (SVM) to make Android malware detection and classification more effective. By carefully analyzing a mere 47 features out of over 9,760 available in the comprehensive CCCS-CIC-AndMal-2020 dataset, our MLP-SVM model achieves an impressive accuracy over 99% in identifying malicious applications. The MLP, enhanced with an attention mechanism, focuses on the most discriminative features and further reduces the 47 features to only 14 components using Linear Discriminant Analysis (LDA). Despite this significant reduction in dimensionality, the SVM component, equipped with an RBF kernel, excels in mapping these components to a high-dimensional space, facilitating precise classification of malware into their respective families. Rigorous evaluations, encompassing accuracy, precision, recall, and F1-score metrics, confirm the superiority of our approach compared to existing state-of-the-art techniques. The proposed framework not only significantly reduces the computational complexity by leveraging a compact feature set but also exhibits resilience against the evolving Android malware landscape.
