A Black-box Attack on Neural Networks Based on Swarm Evolutionary Algorithm
Xiaolei Liu, Yuheng Luo, Xiaosong Zhang, Qingxin Zhu
TL;DR
The paper addresses the vulnerability of neural networks to adversarial samples in black-box settings. It introduces BANA, a black-box attack built on a swarm evolutionary algorithm that optimizes pixel perturbations without gradient information, achieving near-100% success on MNIST, CIFAR-10, and ImageNet with small $L_2$ distances. The method remains effective against defensive distillation due to stochasticity, and robustness trends reflect model and dataset complexity. Practically, this work highlights persistent security risks in image classifiers and motivates the development of robust defenses beyond gradient-based or distillation-based techniques.
Abstract
Neural networks play an increasingly important role in the field of machine learning and are included in many applications in society. Unfortunately, neural networks suffer from adversarial samples generated to attack them. However, most of the generation approaches either assume that the attacker has full knowledge of the neural network model or are limited by the type of attacked model. In this paper, we propose a new approach that generates a black-box attack to neural networks based on the swarm evolutionary algorithm. Benefiting from the improvements in the technology and theoretical characteristics of evolutionary algorithms, our approach has the advantages of effectiveness, black-box attack, generality, and randomness. Our experimental results show that both the MNIST images and the CIFAR-10 images can be perturbed to successful generate a black-box attack with 100\% probability on average. In addition, the proposed attack, which is successful on distilled neural networks with almost 100\% probability, is resistant to defensive distillation. The experimental results also indicate that the robustness of the artificial intelligence algorithm is related to the complexity of the model and the data set. In addition, we find that the adversarial samples to some extent reproduce the characteristics of the sample data learned by the neural network model.
