Table of Contents
Fetching ...

DWFS-Obfuscation: Dynamic Weighted Feature Selection for Robust Malware Familial Classification under Obfuscation

Xingyuan Wei, Zijun Cheng, Ning Li, Qiujian Lv, Ziyang Yu, Degang Sun

TL;DR

This paper tackles Android malware detection under obfuscation by introducing Dynamic Weighted Feature Selection (DWFS) to isolate anti-obfuscation features from a large static feature pool and by integrating these features with a Sensitive Behavior Subgraph (SBS) on a Function Call Graph via Graph Neural Networks for family-level classification. DWFS evaluates feature importance on unobfuscated data and stability across multiple obfuscation strategies, producing a robust feature subset that, when combined with SBS and GNNs, yields high accuracy on both unobfuscated (≈$95.56ackslash ext{k}$) and obfuscated (≈$92.28ackslash ext{k}$) malware datasets. The study demonstrates large-scale effectiveness with 8,664 malware samples and 44,940 obfuscated variants, showing substantial graph simplification (≈83% fewer edges/nodes) and practical detection performance, while providing open-source code and data for future research. The approach advances resilient malware familial classification by marrying anti-obfuscation feature selection with structural graph representations, enabling robust, scalable detection in realistic obfuscation scenarios.

Abstract

Due to its open-source nature, the Android operating system has consistently been a primary target for attackers. Learning-based methods have made significant progress in the field of Android malware detection. However, traditional detection methods based on static features struggle to identify obfuscated malicious code, while methods relying on dynamic analysis suffer from low efficiency. To address this, we propose a dynamic weighted feature selection method that analyzes the importance and stability of features, calculates scores to filter out the most robust features, and combines these selected features with the program's structural information. We then utilize graph neural networks for classification, thereby improving the robustness and accuracy of the detection system. We analyzed 8,664 malware samples from eight malware families and tested a total of 44,940 malware variants generated using seven obfuscation strategies. Experiments demonstrate that our proposed method achieves an F1-score of 95.56% on the unobfuscated dataset and 92.28% on the obfuscated dataset, indicating that the model can effectively detect obfuscated malware.

DWFS-Obfuscation: Dynamic Weighted Feature Selection for Robust Malware Familial Classification under Obfuscation

TL;DR

This paper tackles Android malware detection under obfuscation by introducing Dynamic Weighted Feature Selection (DWFS) to isolate anti-obfuscation features from a large static feature pool and by integrating these features with a Sensitive Behavior Subgraph (SBS) on a Function Call Graph via Graph Neural Networks for family-level classification. DWFS evaluates feature importance on unobfuscated data and stability across multiple obfuscation strategies, producing a robust feature subset that, when combined with SBS and GNNs, yields high accuracy on both unobfuscated (≈) and obfuscated (≈) malware datasets. The study demonstrates large-scale effectiveness with 8,664 malware samples and 44,940 obfuscated variants, showing substantial graph simplification (≈83% fewer edges/nodes) and practical detection performance, while providing open-source code and data for future research. The approach advances resilient malware familial classification by marrying anti-obfuscation feature selection with structural graph representations, enabling robust, scalable detection in realistic obfuscation scenarios.

Abstract

Due to its open-source nature, the Android operating system has consistently been a primary target for attackers. Learning-based methods have made significant progress in the field of Android malware detection. However, traditional detection methods based on static features struggle to identify obfuscated malicious code, while methods relying on dynamic analysis suffer from low efficiency. To address this, we propose a dynamic weighted feature selection method that analyzes the importance and stability of features, calculates scores to filter out the most robust features, and combines these selected features with the program's structural information. We then utilize graph neural networks for classification, thereby improving the robustness and accuracy of the detection system. We analyzed 8,664 malware samples from eight malware families and tested a total of 44,940 malware variants generated using seven obfuscation strategies. Experiments demonstrate that our proposed method achieves an F1-score of 95.56% on the unobfuscated dataset and 92.28% on the obfuscated dataset, indicating that the model can effectively detect obfuscated malware.

Paper Structure

This paper contains 22 sections, 4 equations, 1 figure, 6 tables, 2 algorithms.

Figures (1)

  • Figure 1: DWFS-Obfuscation Overview.