Unraveling the Key of Machine Learning Solutions for Android Malware Detection
Jiahao Liu, Jun Zeng, Fabio Pierazzi, Lorenzo Cavallaro, Zhenkai Liang
TL;DR
This paper undertakes the first systematic, empirical survey of ML-based Android malware detection, addressing fragmentation across security, software engineering, and ML communities. It introduces FrameDroid, a general framework that unifies APK characterization, feature encoding, and ML modeling to enable fair, end-to-end evaluation of detectors under realistic conditions, including malware evolution and adversarial perturbations. By re-implementing 12 representative approaches on a large, time-spanning AndroZoo-derived dataset, the study reveals that stronger ML models do not universally improve performance, and that feature engineering and robustness considerations are crucial for real-world effectiveness. The work provides practical guidance on balancing detection performance with efficiency and robustness, and it releases artifacts to support reproducibility and future research.
Abstract
Android malware detection serves as the front line against malicious apps. With the rapid advancement of machine learning (ML), ML-based Android malware detection has attracted increasing attention due to its capability of automatically capturing malicious patterns from Android APKs. These learning-driven methods have reported promising results in detecting malware. However, the absence of an in-depth analysis of current research progress makes it difficult to gain a holistic picture of the state of the art in this area. This paper presents a comprehensive investigation to date into ML-based Android malware detection with empirical and quantitative analysis. We first survey the literature, categorizing contributions into a taxonomy based on the Android feature engineering and ML modeling pipeline. Then, we design a general-propose framework for ML-based Android malware detection, re-implement 12 representative approaches from different research communities, and evaluate them from three primary dimensions, i.e., effectiveness, robustness, and efficiency. The evaluation reveals that ML-based approaches still face open challenges and provides insightful findings like more powerful ML models are not the silver bullet for designing better malware detectors. We further summarize our findings and put forth recommendations to guide future research.
