Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Ray Telikani; Amir H. Gandomi

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Ray Telikani, Amir H. Gandomi

TL;DR

AdvBandit is introduced, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy.

Abstract

Neural contextual bandits are vulnerable to adversarial attacks, where subtle perturbations to rewards, actions, or contexts induce suboptimal decisions. We introduce AdvBandit, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy. The attacker requires no access to the victim's internal parameters, reward function, or gradient information; instead, it constructs a surrogate model using a maximum-entropy inverse reinforcement learning module from observed context-action pairs and optimizes perturbations against this surrogate using projected gradient descent. An upper confidence bound-aware Gaussian process guides arm selection. An attack-budget control mechanism is also introduced to limit detection risk and overhead. We provide theoretical guarantees, including sublinear attacker regret and lower bounds on victim regret linear in the number of attacks. Experiments on three real-world datasets (Yelp, MovieLens, and Disin) against various victim contextual bandits demonstrate that our attack model achieves higher cumulative victim regret than state-of-the-art baselines.

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

TL;DR

Abstract

Paper Structure (67 sections, 26 theorems, 95 equations, 7 figures, 12 tables, 2 algorithms)

This paper contains 67 sections, 26 theorems, 95 equations, 7 figures, 12 tables, 2 algorithms.

Introduction
Related Works
Problem Statement
Attack Setting.
Periodical Attack Model
Surrogate Modeling
Reward and Uncertainty Networks.
UCB-Aware Surrogate Policy.
Training Objective.
Context Feature Extraction
Query Selection
Attacker arm Selection
Gaussian Process Model.
Perturbation Generation
Attack loss $\mathcal{L}_{\mathrm{eff}}$.
...and 52 more sections

Key Result

Theorem 5.2

With attack budget $B$, perturbation bound $\epsilon$, and IRL retraining interval $\Delta_{\textsc{irl}}$ with window size $W$, the victim's cumulative regret satisfies with probability at least $1-\rho$: where $\bar{\alpha} = \frac{1}{B}\sum_{t:\,z_t=1}\![\Delta(\mathbf{x}_t, a_t^\dagger) - 2L_h\epsilon]^{+}$ is the average positive attackability margin, $d_\Theta = O(d \cdot d_h \cdot K)$ is t

Figures (7)

Figure 1: Performance evaluation of AdvBandit under adversarial settings in terms of regret on real datasets.
Figure 2: Distribution of continuous arm components ($\lambda^{(1)}$(effectiveness), $\lambda^{(2)}$ (evasion), $\lambda^{(3)}$ (temporal)) across victim algorithms on the Yelp dataset.
Figure 3: Performance of different attack strategies under varying attack budgets: averaged from the real datasets (Yelp, MovieLens, and Disin).
Figure 4: Runtime scalability of attack baselines: (a) horizon $T$ (fixed $B=200$) and (b) attack budget $B$ (fixed $T=5000$). Red stars ($\bigstar$) and vertical lines mark the standard experimental setting ($T=5000$, $B=200$) used throughout the paper.
Figure 5: Distribution of continuous arm components ($\lambda^{(1)}$(effectiveness), $\lambda^{(2)}$ (evasion), $\lambda^{(3)}$(temporal)) across victim algorithms on the MovieLens and Disin datasets.
...and 2 more figures

Theorems & Definitions (60)

Definition 3.1: Bandit-Powered Attack
Definition 5.1: Attackability Margin
Theorem 5.2: Victim's Cumulative Regret
proof : Proof sketch
Theorem 5.4: Attacker's Regret under Approximate Realizability
proof : Proof sketch
Remark 5.5
Proposition C.1: Entropy Predicts Attack Success
proof
Proposition C.2: Weight Predicts Induced Regret
...and 50 more

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

TL;DR

Abstract

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (60)