Reassessing feature-based Android malware detection in a contemporary context

Ali Muzaffar; Hani Ragab Hassen; Hind Zantout; Michael A Lones

Reassessing feature-based Android malware detection in a contemporary context

Ali Muzaffar, Hani Ragab Hassen, Hind Zantout, Michael A Lones

TL;DR

The paper tackles whether feature-based Android malware detection remains effective in a contemporary setting, given larger feature spaces and evolving malware. It reimplements 18 foundational studies on a level playing field using a large, balanced dataset and a standardized feature extraction pipeline, including static, dynamic, and ensemble analyses. Key findings show feature-based methods can still exceed 98% accuracy, with static features (notably API calls and opcodes) often outperforming dynamic signals except in network traffic, which is highly predictive but costly to collect; ensembles can match network-based performance with more practical feature sets. The work underscores the continued relevance of simple, fast ML approaches, highlights ML-practice pitfalls that can inflate prior results, and provides shared datasets and tools to improve reproducibility and future benchmarking in Android malware research.

Abstract

We report the findings of a reimplementation of 18 foundational studies in feature-based machine learning for Android malware detection, published during the period 2013-2023. These studies are reevaluated on a level playing field using a contemporary Android environment and a balanced dataset of 124,000 applications. Our findings show that feature-based approaches can still achieve detection accuracies beyond 98%, despite a considerable increase in the size of the underlying Android feature sets. We observe that features derived through dynamic analysis yield only a small benefit over those derived from static analysis, and that simpler models often out-perform more complex models. We also find that API calls and opcodes are the most productive static features within our evaluation context, network traffic is the most predictive dynamic feature, and that ensemble models provide an efficient means of combining models trained on static and dynamic features. Together, these findings suggest that simple, fast machine learning approaches can still be an effective basis for malware detection, despite the increasing focus on slower, more expensive machine learning models in the literature.

Reassessing feature-based Android malware detection in a contemporary context

TL;DR

Abstract

Paper Structure (38 sections, 9 figures, 33 tables)

This paper contains 38 sections, 9 figures, 33 tables.

Abstract
Introduction
Background
Static approaches
Dynamic and hybrid approaches
End-to-end approaches
Challenges
Methodology
Choice of studies to reimplement
Inclusion of core ML models and feature selection algorithms
Dataset collection
Feature extraction process
Model evaluation procedure
Reimplementation and extension of static modelling approaches
Use of permissions and API call features
...and 23 more sections

Figures (9)

Figure 1: Overview of the methodology, showing section numbers containing results
Figure 2: Overview of the DroidDissector feature extraction tool used in this study
Figure 3: Relationship between number of permissions used and model accuracy when permissions are ranked using different methods
Figure 4: Effect of changing the variance threshold upon accuracy of random forest models. Also shows the number of permissions selected for different variance thresholds.
Figure 5: Relationship between number of API calls used and model accuracy for API usage features ranked using various methods
...and 4 more figures

Reassessing feature-based Android malware detection in a contemporary context

TL;DR

Abstract

Reassessing feature-based Android malware detection in a contemporary context

Authors

TL;DR

Abstract

Table of Contents

Figures (9)