Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

Shiliang Zuo

Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

Shiliang Zuo

TL;DR

This work studies robustness of stochastic multi-arm bandits under a strong adversary that can corrupt observed rewards with a budget. It derives near-optimal attack strategies against UCB and Thompson Sampling, achieving a cumulative corruption cost of $\widehat{O}(\sqrt{\log T})$ and proving matching lower bounds, while giving a $\Omega(\log T)$ lower bound for $\varepsilon$-greedy. To counteract such attacks, it proposes defenses grounded in smoothed analysis and behavioral economics, yielding two simple algorithms—Smoothed Myopic Response and Quantal Response—that achieve a competitive ratio arbitrarily close to 1 when the corruption budget is sublinear in $T$. The paper also provides experimental validation showing substantial improvements over prior attacks and robust performance of the proposed defenses. Collectively, the results illuminate the vulnerability of classical bandit algorithms under strong adversaries and offer practical, theory-backed defense strategies with strong performance guarantees.

Abstract

I study adversarial attacks against stochastic bandit algorithms. At each round, the learner chooses an arm, and a stochastic reward is generated. The adversary strategically adds corruption to the reward, and the learner is only able to observe the corrupted reward at each round. Two sets of results are presented in this paper. The first set studies the optimal attack strategies for the adversary. The adversary has a target arm he wishes to promote, and his goal is to manipulate the learner into choosing this target arm $T - o(T)$ times. I design attack strategies against UCB and Thompson Sampling that only spend $\widehat{O}(\sqrt{\log T})$ cost. Matching lower bounds are presented, and the vulnerability of UCB, Thompson sampling, and $\varepsilon$-greedy are exactly characterized. The second set studies how the learner can defend against the adversary. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.

Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

TL;DR

and proving matching lower bounds, while giving a

lower bound for

-greedy. To counteract such attacks, it proposes defenses grounded in smoothed analysis and behavioral economics, yielding two simple algorithms—Smoothed Myopic Response and Quantal Response—that achieve a competitive ratio arbitrarily close to 1 when the corruption budget is sublinear in

. The paper also provides experimental validation showing substantial improvements over prior attacks and robust performance of the proposed defenses. Collectively, the results illuminate the vulnerability of classical bandit algorithms under strong adversaries and offer practical, theory-backed defense strategies with strong performance guarantees.

Abstract

times. I design attack strategies against UCB and Thompson Sampling that only spend

cost. Matching lower bounds are presented, and the vulnerability of UCB, Thompson sampling, and

-greedy are exactly characterized. The second set studies how the learner can defend against the adversary. Inspired by literature on smoothed analysis and behavioral economics, I present two simple algorithms that achieve a competitive ratio arbitrarily close to 1.

Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

TL;DR

Abstract

Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (40)