Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Shuyu Cheng; Yibo Miao; Yinpeng Dong; Xiao Yang; Xiao-Shan Gao; Jun Zhu

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu

TL;DR

This work tackles the problem of black-box adversarial attacks that rely solely on input-output feedback by enabling efficient query-based optimization. It introduces Prior-guided Bayesian Optimization (P-BO), which uses a surrogate white-box model as a deterministic function prior within a Gaussian-process Bayesian optimization framework, and augments it with an adaptive integration strategy that weights the prior with a tunable parameter $\lambda$ to minimize regret. Theoretical analysis shows the regret bound scales with the RKHS norm of the difference between the target function and the prior, motivating adaptive weighting to avoid degradation when the prior is poor. Empirical results across CIFAR-10, ImageNet, and vision-language models demonstrate substantial gains in attack success rate and query efficiency, with P-BO often achieving near- or perfect success in far fewer queries than strong baselines. These findings highlight the practical impact of leveraging global function priors to improve black-box attack efficiency, while the provided code enables reproducibility and further exploration.

Abstract

This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradient is not informative enough, making these methods still query-intensive. In this paper, we propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. As the surrogate model contains rich prior information of the black-box one, P-BO models the attack objective with a Gaussian process whose mean function is initialized as the surrogate model's loss. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior. Therefore, we further propose an adaptive integration strategy to automatically adjust a coefficient on the function prior by minimizing the regret bound. Extensive experiments on image classifiers and large vision-language models demonstrate the superiority of the proposed algorithm in reducing queries and improving attack success rates compared with the state-of-the-art black-box attacks. Code is available at https://github.com/yibo-miao/PBO-Attack.

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

TL;DR

to minimize regret. Theoretical analysis shows the regret bound scales with the RKHS norm of the difference between the target function and the prior, motivating adaptive weighting to avoid degradation when the prior is poor. Empirical results across CIFAR-10, ImageNet, and vision-language models demonstrate substantial gains in attack success rate and query efficiency, with P-BO often achieving near- or perfect success in far fewer queries than strong baselines. These findings highlight the practical impact of leveraging global function priors to improve black-box attack efficiency, while the provided code enables reproducibility and further exploration.

Abstract

Paper Structure (27 sections, 2 theorems, 32 equations, 3 figures, 10 tables, 1 algorithm)

This paper contains 27 sections, 2 theorems, 32 equations, 3 figures, 10 tables, 1 algorithm.

Introduction
Preliminaries
Black-box Adversarial Attacks
Bayesian Optimization
Methodology
Prior-guided Bayesian Optimization
Adaptive Integration Strategy
Experiments
Experimental Settings
Experimental Results on CIFAR-10
Experimental Results on ImageNet
Experimental Results on Vision-Language Models
Experimental Results on Defense Models
Performance of Adaptive Integration Strategy
Conclusion
...and 12 more sections

Key Result

Theorem 3.1

(Proof in Appendix app:a-1) Assume $f$ and $f'$ lie in the Reproducing Kernel Hilbert Space (RKHS) corresponding to kernel $k$, and let $\|\cdot\|_k$ denote the RKHS norm. In Bayesian optimization, suppose we model $f$ by $\operatorname{GP}(f',k)$ with observation noise $\mathcal{N}(0, \sigma^2)$, a where $\gamma_T=\frac{1}{2}\max_{\bm{x}_1,\ldots,\bm{x}_T\in A}\log |\mathbf{I}+\sigma^{-2}\mathbf{

Figures (3)

Figure 1: An illustration of the Prior-guided Random Gradient-Free (P-RGF) dong2022query, Bayesian Optimization (BO), and Prior-guided Bayesian Optimization (P-BO) algorithms. The previous approaches, exemplified by P-RGF, adopt a local gradient of the surrogate model for gradient estimation. BO typically employs a zero-mean Gaussian process to approximate the unknown objective function, without leveraging any prior information. Our proposed P-BO algorithm integrates the surrogate model as a function prior into BO, which can better approximate the objective function and thus improve the query efficiency of black-box adversarial attacks.
Figure 2: We show two adversarial examples against InstructBLIP. They mislead the VLM to output wrong descriptions.
Figure 3: The mean and standard deviation of $\lambda^*$ over the first 30 iterations of P-BO applied to different target models on CIFAR-10 and ImageNet. $\lambda^*$ on ImageNet is substantially lower than that on CIFAR-10. This implies a lower similarity between different ImageNet models, and thus the function prior might be less useful.

Theorems & Definitions (6)

Theorem 3.1
Remark 3.2
Proposition 3.3
Remark 3.4
proof
proof

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

TL;DR

Abstract

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)