Table of Contents
Fetching ...

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

Ray Telikani, Amir H. Gandomi

TL;DR

AdvBandit is introduced, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy.

Abstract

Neural contextual bandits are vulnerable to adversarial attacks, where subtle perturbations to rewards, actions, or contexts induce suboptimal decisions. We introduce AdvBandit, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy. The attacker requires no access to the victim's internal parameters, reward function, or gradient information; instead, it constructs a surrogate model using a maximum-entropy inverse reinforcement learning module from observed context-action pairs and optimizes perturbations against this surrogate using projected gradient descent. An upper confidence bound-aware Gaussian process guides arm selection. An attack-budget control mechanism is also introduced to limit detection risk and overhead. We provide theoretical guarantees, including sublinear attacker regret and lower bounds on victim regret linear in the number of attacks. Experiments on three real-world datasets (Yelp, MovieLens, and Disin) against various victim contextual bandits demonstrate that our attack model achieves higher cumulative victim regret than state-of-the-art baselines.

Learning to Attack: A Bandit Approach to Adversarial Context Poisoning

TL;DR

AdvBandit is introduced, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy.

Abstract

Neural contextual bandits are vulnerable to adversarial attacks, where subtle perturbations to rewards, actions, or contexts induce suboptimal decisions. We introduce AdvBandit, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy. The attacker requires no access to the victim's internal parameters, reward function, or gradient information; instead, it constructs a surrogate model using a maximum-entropy inverse reinforcement learning module from observed context-action pairs and optimizes perturbations against this surrogate using projected gradient descent. An upper confidence bound-aware Gaussian process guides arm selection. An attack-budget control mechanism is also introduced to limit detection risk and overhead. We provide theoretical guarantees, including sublinear attacker regret and lower bounds on victim regret linear in the number of attacks. Experiments on three real-world datasets (Yelp, MovieLens, and Disin) against various victim contextual bandits demonstrate that our attack model achieves higher cumulative victim regret than state-of-the-art baselines.
Paper Structure (67 sections, 26 theorems, 95 equations, 7 figures, 12 tables, 2 algorithms)

This paper contains 67 sections, 26 theorems, 95 equations, 7 figures, 12 tables, 2 algorithms.

Key Result

Theorem 5.2

With attack budget $B$, perturbation bound $\epsilon$, and IRL retraining interval $\Delta_{\textsc{irl}}$ with window size $W$, the victim's cumulative regret satisfies with probability at least $1-\rho$: where $\bar{\alpha} = \frac{1}{B}\sum_{t:\,z_t=1}\![\Delta(\mathbf{x}_t, a_t^\dagger) - 2L_h\epsilon]^{+}$ is the average positive attackability margin, $d_\Theta = O(d \cdot d_h \cdot K)$ is t

Figures (7)

  • Figure 1: Performance evaluation of AdvBandit under adversarial settings in terms of regret on real datasets.
  • Figure 2: Distribution of continuous arm components ($\lambda^{(1)}$(effectiveness), $\lambda^{(2)}$ (evasion), $\lambda^{(3)}$ (temporal)) across victim algorithms on the Yelp dataset.
  • Figure 3: Performance of different attack strategies under varying attack budgets: averaged from the real datasets (Yelp, MovieLens, and Disin).
  • Figure 4: Runtime scalability of attack baselines: (a) horizon $T$ (fixed $B=200$) and (b) attack budget $B$ (fixed $T=5000$). Red stars ($\bigstar$) and vertical lines mark the standard experimental setting ($T=5000$, $B=200$) used throughout the paper.
  • Figure 5: Distribution of continuous arm components ($\lambda^{(1)}$(effectiveness), $\lambda^{(2)}$ (evasion), $\lambda^{(3)}$(temporal)) across victim algorithms on the MovieLens and Disin datasets.
  • ...and 2 more figures

Theorems & Definitions (60)

  • Definition 3.1: Bandit-Powered Attack
  • Definition 5.1: Attackability Margin
  • Theorem 5.2: Victim's Cumulative Regret
  • proof : Proof sketch
  • Theorem 5.4: Attacker's Regret under Approximate Realizability
  • proof : Proof sketch
  • Remark 5.5
  • Proposition C.1: Entropy Predicts Attack Success
  • proof
  • Proposition C.2: Weight Predicts Induced Regret
  • ...and 50 more