LFreeDA: Label-Free Drift Adaptation for Windows Malware Detection
Adrian Shuai Li, Elisa Bertino
TL;DR
LFreeDA addresses the pervasive problem of concept drift in Windows malware detection without requiring target labels. It combines unsupervised domain adaptation on malware image representations to generate pseudo-labels, a conservative per-class pseudo-label selection step, and a final adaptation phase using those pseudo-labels on both image and CFG representations, via AdvDA or warm-start strategies. Empirical results on real-world MB-24+ data and controlled BIG-15 and MalwareDrift benchmarks show substantial gains over no-adaptation baselines (up to 12.6% accuracy and 11.1% F1) and performance close to fully supervised upper bounds, even matching state-of-the-art with ground-truth labels for hundreds of target samples. The findings provide practical deployment guidance, illustrating when LFreeDA is most effective and how to balance pseudo-label quality and coverage, with robustness to obfuscation and scalable applicability to real-world malware evolution.
Abstract
Machine learning (ML)-based malware detectors degrade over time as concept drift introduces new and evolving families unseen during training. Retraining is limited by the cost and time of manual labeling or sandbox analysis. Existing approaches mitigate this via drift detection and selective labeling, but fully label-free adaptation remains largely unexplored. Recent self-training methods use a previously trained model to generate pseudo-labels for unlabeled data and then train a new model on these labels. The unlabeled data are used only for inference and do not participate in training the earlier model. We argue that these unlabeled samples still carry valuable information that can be leveraged when incorporated appropriately into training. This paper introduces LFreeDA, an end-to-end framework that adapts malware classifiers to drift without manual labeling or drift detection. LFreeDA first performs unsupervised domain adaptation on malware images, jointly training on labeled and unlabeled samples to infer pseudo-labels and prune noisy ones. It then adapts a classifier on CFG representations using the labeled and selected pseudo-labeled data, leveraging the scalability of images for pseudo-labeling and the richer semantics of CFGs for final adaptation. Evaluations on the real-world MB-24+ dataset show that LFreeDA improves accuracy by up to 12.6% and F1 by 11.1% over no-adaptation lower bounds, and is only 4% and 3.4% below fully supervised upper bounds in accuracy and F1, respectively. It also matches the performance of state-of-the-art methods provided with ground truth labels for 300 target samples. Additional results on two controlled-drift benchmarks further confirm that LFreeDA maintains malware detection performance as malware evolves without human labeling.
