Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN
Weiwei Hu, Ying Tan
TL;DR
The paper targets the vulnerability of ML-based malware detectors under black-box access by introducing MalGAN, a GAN-based framework that uses a substitute detector to approximate the unknown detector and a generator to produce adversarial binary API features. The generator adds features to original malware samples, guided by gradient signals from the substitute detector, to minimize the detected malicious probability. Across multiple back-end detectors, MalGAN achieves near-zero true positive rates for adversarial samples and outperforms gradient-based white-box attacks, while retraining defenses show only temporary effectiveness due to rapid distributional shifts. The work demonstrates a dynamic vulnerability in black-box malware detection and highlights the challenge of defending against adaptive adversaries in real-world deployment.
Abstract
Machine learning has been used to detect new malware in recent years, while malware authors have strong motivation to attack such algorithms. Malware authors usually have no access to the detailed structures and parameters of the machine learning models used by malware detection systems, and therefore they can only perform black-box attacks. This paper proposes a generative adversarial network (GAN) based algorithm named MalGAN to generate adversarial malware examples, which are able to bypass black-box machine learning based detection models. MalGAN uses a substitute detector to fit the black-box malware detection system. A generative network is trained to minimize the generated adversarial examples' malicious probabilities predicted by the substitute detector. The superiority of MalGAN over traditional gradient based adversarial example generation algorithms is that MalGAN is able to decrease the detection rate to nearly zero and make the retraining based defensive method against adversarial examples hard to work.
