Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior
Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu
TL;DR
This work tackles the problem of black-box adversarial attacks that rely solely on input-output feedback by enabling efficient query-based optimization. It introduces Prior-guided Bayesian Optimization (P-BO), which uses a surrogate white-box model as a deterministic function prior within a Gaussian-process Bayesian optimization framework, and augments it with an adaptive integration strategy that weights the prior with a tunable parameter $\lambda$ to minimize regret. Theoretical analysis shows the regret bound scales with the RKHS norm of the difference between the target function and the prior, motivating adaptive weighting to avoid degradation when the prior is poor. Empirical results across CIFAR-10, ImageNet, and vision-language models demonstrate substantial gains in attack success rate and query efficiency, with P-BO often achieving near- or perfect success in far fewer queries than strong baselines. These findings highlight the practical impact of leveraging global function priors to improve black-box attack efficiency, while the provided code enables reproducibility and further exploration.
Abstract
This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradient is not informative enough, making these methods still query-intensive. In this paper, we propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. As the surrogate model contains rich prior information of the black-box one, P-BO models the attack objective with a Gaussian process whose mean function is initialized as the surrogate model's loss. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior. Therefore, we further propose an adaptive integration strategy to automatically adjust a coefficient on the function prior by minimizing the regret bound. Extensive experiments on image classifiers and large vision-language models demonstrate the superiority of the proposed algorithm in reducing queries and improving attack success rates compared with the state-of-the-art black-box attacks. Code is available at https://github.com/yibo-miao/PBO-Attack.
