Adversarial Attacks on Combinatorial Multi-Armed Bandits

Rishab Balasubramanian; Jiawei Li; Prasad Tadepalli; Huazheng Wang; Qingyun Wu; Haoyu Zhao

Adversarial Attacks on Combinatorial Multi-Armed Bandits

Rishab Balasubramanian, Jiawei Li, Prasad Tadepalli, Huazheng Wang, Qingyun Wu, Haoyu Zhao

TL;DR

The paper addresses the vulnerability of combinatorial multi-armed bandits (CMAB) to reward-poisoning attacks. It introduces polynomial attackability as a realistic metric that scales polynomially with the number of base arms and other problem parameters, and proves a necessary and sufficient condition based on the problem's gap structure and triggering probabilities. A key finding is that attack feasibility can depend on whether the reinforcement learning environment is known to the attacker, revealing a practical barrier to universal attack strategies. The authors propose an attack algorithm and validate it through extensive experiments on probabilistic maximum coverage, online minimum spanning tree, cascading bandits, and online shortest path, showing sublinear attack costs and linear target-arm pulls in attackable instances. These results provide guidance for designing robust CMAB algorithms and highlight the importance of instance- and environment-specific defenses in real-world applications.

Abstract

We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, a notion to capture the vulnerability and robustness of CMAB. The attackability condition depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.

Adversarial Attacks on Combinatorial Multi-Armed Bandits

TL;DR

Abstract

Paper Structure (50 sections, 20 theorems, 45 equations, 4 figures, 1 algorithm)

This paper contains 50 sections, 20 theorems, 45 equations, 4 figures, 1 algorithm.

Introduction
Our contribution
Related works
Adversarial attacks on bandits and reinforcement learning
Corruption-tolerant bandits
Preliminary
Combinatorial semi-bandit
CUCB algorithm
Threat model
Selected applications of CMAB
Online minimum spanning tree
Online shortest path
Cascading bandit
Probabilistic maximum coverage
Polynomial Attackability of CMAB Instances
...and 35 more sections

Key Result

Theorem 3.6

Given a particular CMAB instance and the target set of super arms ${\mathcal{M}}$ to attack. If $\Delta_{{\mathcal{M}}} > 0$, then the CMAB instance is polynomially attackable. If $\Delta_{{\mathcal{M}}} < 0$, the instance is polynomially unattackable.

Figures (4)

Figure 1: Example \ref{['exp:hard-example']} with $n=5$
Figure 2: Cost and target arm pulls for: (\ref{['fig:prob_coverage_cost']}, \ref{['fig:prob_coverage_rate']}) probabilistic max coverage; (\ref{['fig: spanningtree_cost']}, \ref{['fig: spanningtree_rate']}) online maximum spanning tree; (\ref{['fig: shortestpath_cost']}, \ref{['fig: shortestpath_rate']}) online shortest path; (\ref{['fig:movielens_cost']}, \ref{['fig:movielens_rate']}) cascading bandits. Experiments are repeated for at least $10$ times and we report the averaged result and its variance.
Figure 3: An unattackable shortest path from $s$ to $t$ in the Flickr dataset. Optimal path: $(s,a,b,e,t)$. Target path: $(s,a,v,c,d,t)$. The cost on $(b,c,d,t)$ is larger than the number of edges on $(b,e,t)$, and the attacker cannot fool the algorithm to play the target path.
Figure 4: Cost and percentage of base arms selected for: (\ref{['fig:im_cost']}, \ref{['fig: im_base_arm_percent']}) Influence Maximization; (\ref{['fig:pmc_cost']}, \ref{['fig: pmc_base_arm_percent']}) Probabilistic Maximum Coverage.

Theorems & Definitions (41)

Definition 3.1: Polynomially attackable
Definition 3.2: Polynomially unattackable
Remark 3.3: Conventional attackability definition vs. polynomially attackable vs. polynomially unattackable
Remark 3.4: Polynomial dependency
Definition 3.5: Gap
Theorem 3.6: Polynomial attackability of CMAB
Corollary 3.7
Corollary 3.8
Theorem 3.9
Corollary 3.10: Informal
...and 31 more

Adversarial Attacks on Combinatorial Multi-Armed Bandits

TL;DR

Abstract

Adversarial Attacks on Combinatorial Multi-Armed Bandits

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (41)