Table of Contents
Fetching ...

Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN

Weiwei Hu, Ying Tan

TL;DR

The paper targets the vulnerability of ML-based malware detectors under black-box access by introducing MalGAN, a GAN-based framework that uses a substitute detector to approximate the unknown detector and a generator to produce adversarial binary API features. The generator adds features to original malware samples, guided by gradient signals from the substitute detector, to minimize the detected malicious probability. Across multiple back-end detectors, MalGAN achieves near-zero true positive rates for adversarial samples and outperforms gradient-based white-box attacks, while retraining defenses show only temporary effectiveness due to rapid distributional shifts. The work demonstrates a dynamic vulnerability in black-box malware detection and highlights the challenge of defending against adaptive adversaries in real-world deployment.

Abstract

Machine learning has been used to detect new malware in recent years, while malware authors have strong motivation to attack such algorithms. Malware authors usually have no access to the detailed structures and parameters of the machine learning models used by malware detection systems, and therefore they can only perform black-box attacks. This paper proposes a generative adversarial network (GAN) based algorithm named MalGAN to generate adversarial malware examples, which are able to bypass black-box machine learning based detection models. MalGAN uses a substitute detector to fit the black-box malware detection system. A generative network is trained to minimize the generated adversarial examples' malicious probabilities predicted by the substitute detector. The superiority of MalGAN over traditional gradient based adversarial example generation algorithms is that MalGAN is able to decrease the detection rate to nearly zero and make the retraining based defensive method against adversarial examples hard to work.

Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN

TL;DR

The paper targets the vulnerability of ML-based malware detectors under black-box access by introducing MalGAN, a GAN-based framework that uses a substitute detector to approximate the unknown detector and a generator to produce adversarial binary API features. The generator adds features to original malware samples, guided by gradient signals from the substitute detector, to minimize the detected malicious probability. Across multiple back-end detectors, MalGAN achieves near-zero true positive rates for adversarial samples and outperforms gradient-based white-box attacks, while retraining defenses show only temporary effectiveness due to rapid distributional shifts. The work demonstrates a dynamic vulnerability in black-box malware detection and highlights the challenge of defending against adaptive adversaries in real-world deployment.

Abstract

Machine learning has been used to detect new malware in recent years, while malware authors have strong motivation to attack such algorithms. Malware authors usually have no access to the detailed structures and parameters of the machine learning models used by malware detection systems, and therefore they can only perform black-box attacks. This paper proposes a generative adversarial network (GAN) based algorithm named MalGAN to generate adversarial malware examples, which are able to bypass black-box machine learning based detection models. MalGAN uses a substitute detector to fit the black-box malware detection system. A generative network is trained to minimize the generated adversarial examples' malicious probabilities predicted by the substitute detector. The superiority of MalGAN over traditional gradient based adversarial example generation algorithms is that MalGAN is able to decrease the detection rate to nearly zero and make the retraining based defensive method against adversarial examples hard to work.

Paper Structure

This paper contains 12 sections, 3 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: The architecture of MalGAN.
  • Figure 2: The change of the true positive rate on the training set and the validation set over time. Random forest is used as the black-box detector here. The vertical axis represents the true positive rate while the horizontal axis represents epoch.
  • Figure 3: True positive rate on the adversarial examples over the iterative process when using the algorithm proposed by Grosse et al..