Computational Intractability of Strategizing against Online Learners
Angelos Assos, Yuval Dagan, Nived Rajaraman
TL;DR
This work addresses the problem of how an optimizer should strategize against online no-regret learners in general two-player repeated games. It establishes a strong NP-hardness result: unless $P=NP$, no polynomial-time optimizer can achieve a near-optimal strategy against Hedge/MWU with horizon $T$, even when action spaces grow only poly$(T)$ and the learner uses a slow-learning rate $\\eta=1/T^{1-\\alpha}$. The authors prove an $\\Omega(T)$ additive hardness by a reduction from $(1,2)$-maxTSP, using a carefully constructed game with initialization and a no-init variant, and showing that any near-optimal policy would yield a good maxTSP solution, contradicting its known hardness. This result reveals a fundamental computational barrier to optimizing against learning agents in general games, suggesting that efficient optimizers may only exist for highly structured settings. The work also delineates how prior weaker hardness results from fictitious-play-type learners do not extend to standard no-regret learners, strengthening the case for structure-aware approaches in practice.
Abstract
Online learning algorithms are widely used in strategic multi-agent settings, including repeated auctions, contract design, and pricing competitions, where agents adapt their strategies over time. A key question in such environments is how an optimizing agent can best respond to a learning agent to improve its own long-term outcomes. While prior work has developed efficient algorithms for the optimizer in special cases - such as structured auction settings or contract design - no general efficient algorithm is known. In this paper, we establish a strong computational hardness result: unless $\mathsf{P} = \mathsf{NP}$, no polynomial-time optimizer can compute a near-optimal strategy against a learner using a standard no-regret algorithm, specifically Multiplicative Weights Update (MWU). Our result proves an $Ω(T)$ hardness bound, significantly strengthening previous work that only showed an additive $Θ(1)$ impossibility result. Furthermore, while the prior hardness result focused on learners using fictitious play - an algorithm that is not no-regret - we prove intractability for a widely used no-regret learning algorithm. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.
