Table of Contents
Fetching ...

MORPH: Towards Automated Concept Drift Adaptation for Malware Detection

Md Tanvirul Alam, Romy Fieblinger, Ashim Mahara, Nidhi Rastogi

TL;DR

This work tackles concept drift in malware detection by introducing MORPH, a pseudo-label–based self-training method for neural networks that continuously adapts to evolving threats. MORPH combines a targeted pseudo-labeling strategy with semi-supervised training, enabling monthly updates that leverage unlabeled data and reduce annotation needs, while optionally pairing with active learning. Across Android and Windows benchmarks, MORPH improves F1 and reduces FNR compared with static baselines and outperforms prior drift-adaptation approaches like DroidEvolver++, demonstrating robust performance under drift. The study discusses practical automation limits, dataset-dependent drift, and future directions such as transformer-based, behavior-centric features to strengthen drift resilience.

Abstract

Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate concept drift. Self-training involves retraining the model using pseudo labels to adapt to shifting data distributions. In this research, we propose MORPH -- an effective pseudo-label-based concept drift adaptation method specifically designed for neural networks. Through extensive experimental analysis of Android and Windows malware datasets, we demonstrate the efficacy of our approach in mitigating the impact of concept drift. Our method offers the advantage of reducing annotation efforts when combined with active learning. Furthermore, our method significantly improves over existing works in automated concept drift adaptation for malware detection.

MORPH: Towards Automated Concept Drift Adaptation for Malware Detection

TL;DR

This work tackles concept drift in malware detection by introducing MORPH, a pseudo-label–based self-training method for neural networks that continuously adapts to evolving threats. MORPH combines a targeted pseudo-labeling strategy with semi-supervised training, enabling monthly updates that leverage unlabeled data and reduce annotation needs, while optionally pairing with active learning. Across Android and Windows benchmarks, MORPH improves F1 and reduces FNR compared with static baselines and outperforms prior drift-adaptation approaches like DroidEvolver++, demonstrating robust performance under drift. The study discusses practical automation limits, dataset-dependent drift, and future directions such as transformer-based, behavior-centric features to strengthen drift resilience.

Abstract

Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate concept drift. Self-training involves retraining the model using pseudo labels to adapt to shifting data distributions. In this research, we propose MORPH -- an effective pseudo-label-based concept drift adaptation method specifically designed for neural networks. Through extensive experimental analysis of Android and Windows malware datasets, we demonstrate the efficacy of our approach in mitigating the impact of concept drift. Our method offers the advantage of reducing annotation efforts when combined with active learning. Furthermore, our method significantly improves over existing works in automated concept drift adaptation for malware detection.
Paper Structure (21 sections, 2 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 2 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Kernel Density Estimate chen2017tutorial plot for probability distribution on Ember dataset 2018arXiv180404637A for correctly & incorrectly classified samples in Windows (left) and Android (right) application data.
  • Figure 2: Gradual adaptation to distribution shift for a binary classifier. The solid line represents the original classifier and the dotted lines represent the adapted classifier trained with pseudo-labels.
  • Figure 3: Proposed concept drift adaptation algorithm, MORPH
  • Figure 4: Kernel Density Estimate chen2017tutorial plot for probability distribution on AndroZoo Allix:2016:ACM:2901739.2903508 dataset for (left) True Positive and True Negative sample and (right) drifted vs not-drifted malware samples
  • Figure 5: F1 score (left), FNR (middle), and FPR (right) for test months on AndroZoo dataset with MORPH and baseline (Static) neural network.
  • ...and 5 more figures