MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation
Eunjin Roh, Yigitcan Kaya, Christopher Kruegel, Giovanni Vigna, Sanghyun Hong
TL;DR
MADCAT tackles concept drift in malware detection by enabling test-time adaptation through a self-supervised masked autoencoder (MAE). The approach comprises an MAE encoder and a fixed classifier head, with optional synergy from pseudo-labeling to balance unlabeled test-time data. Evaluated on a seven-year Android malware dataset, MADCAT maintains stable F1 scores and outperforms a purely supervised baseline under drift, with ablations highlighting the importance of data balancing and moderate masking. The work demonstrates that self-supervision can robustly adapt malware detectors to evolving threats and can complement supervised methods in practical, label-scarce settings.
Abstract
We present MADCAT, a self-supervised approach designed to address the concept drift problem in malware detection. MADCAT employs an encoder-decoder architecture and works by test-time training of the encoder on a small, balanced subset of the test-time data using a self-supervised objective. During test-time training, the model learns features that are useful for detecting both previously seen (old) data and newly arriving samples. We demonstrate the effectiveness of MADCAT in continuous Android malware detection settings. MADCAT consistently outperforms baseline methods in detection performance at test time. We also show the synergy between MADCAT and prior approaches in addressing concept drift in malware detection
