Table of Contents
Fetching ...

Imbalanced malware classification: an approach based on dynamic classifier selection

J. V. S. Souza, C. B. Vieira, G. D. C. Cavalcanti, R. M. O. Cruz

TL;DR

This work tackles severe class imbalance in Android malware detection by evaluating monolithic, static ensemble, and dynamic selection strategies on the Drebin dataset. It introduces Bootstrap-Based Balancing (BBB) to diversify a classifier pool and uses dynamic selection to adaptively pick competent models per instance, achieving state-of-the-art results with KNOP on a Random Forest pool. The results show that balancing improves minority-class recall and reduces instance hardness, though precision can suffer, highlighting trade-offs intrinsic to imbalanced learning. Overall, the approach offers a robust, scalable framework for mobile malware defense with publicly available replication code.

Abstract

In recent years, the rise of cyber threats has emphasized the need for robust malware detection systems, especially on mobile devices. Malware, which targets vulnerabilities in devices and user data, represents a substantial security risk. A significant challenge in malware detection is the imbalance in datasets, where most applications are benign, with only a small fraction posing a threat. This study addresses the often-overlooked issue of class imbalance in malware detection by evaluating various machine learning strategies for detecting malware in Android applications. We assess monolithic classifiers and ensemble methods, focusing on dynamic selection algorithms, which have shown superior performance compared to traditional approaches. In contrast to balancing strategies performed on the whole dataset, we propose a balancing procedure that works individually for each classifier in the pool. Our empirical analysis demonstrates that the KNOP algorithm obtained the best results using a pool of Random Forest. Additionally, an instance hardness assessment revealed that balancing reduces the difficulty of the minority class and enhances the detection of the minority class (malware). The code used for the experiments is available at https://github.com/jvss2/Machine-Learning-Empirical-Evaluation.

Imbalanced malware classification: an approach based on dynamic classifier selection

TL;DR

This work tackles severe class imbalance in Android malware detection by evaluating monolithic, static ensemble, and dynamic selection strategies on the Drebin dataset. It introduces Bootstrap-Based Balancing (BBB) to diversify a classifier pool and uses dynamic selection to adaptively pick competent models per instance, achieving state-of-the-art results with KNOP on a Random Forest pool. The results show that balancing improves minority-class recall and reduces instance hardness, though precision can suffer, highlighting trade-offs intrinsic to imbalanced learning. Overall, the approach offers a robust, scalable framework for mobile malware defense with publicly available replication code.

Abstract

In recent years, the rise of cyber threats has emphasized the need for robust malware detection systems, especially on mobile devices. Malware, which targets vulnerabilities in devices and user data, represents a substantial security risk. A significant challenge in malware detection is the imbalance in datasets, where most applications are benign, with only a small fraction posing a threat. This study addresses the often-overlooked issue of class imbalance in malware detection by evaluating various machine learning strategies for detecting malware in Android applications. We assess monolithic classifiers and ensemble methods, focusing on dynamic selection algorithms, which have shown superior performance compared to traditional approaches. In contrast to balancing strategies performed on the whole dataset, we propose a balancing procedure that works individually for each classifier in the pool. Our empirical analysis demonstrates that the KNOP algorithm obtained the best results using a pool of Random Forest. Additionally, an instance hardness assessment revealed that balancing reduces the difficulty of the minority class and enhances the detection of the minority class (malware). The code used for the experiments is available at https://github.com/jvss2/Machine-Learning-Empirical-Evaluation.

Paper Structure

This paper contains 6 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Experimental framework for empirical evaluation of malware detection models on android devices.
  • Figure 2: Performance comparison of balancing techniques and models.
  • Figure 3: Cumulative distribution of KDN score.