Strategizing against Q-learners: A Control-theoretical Approach
Yuksel Arslantas, Ege Yuceel, Muhammed O. Sayin
TL;DR
This work analyzes how strategically sophisticated opponents can manipulate naive independent Q-learners in repeated normal-form games through a control-theoretic lens. It models the interaction as a stochastic game with the continuum state given by the Naive-types' Q-function estimates, and establishes Lipschitz continuity of the value functions to enable accurate quantization-based approximations. A quantization mapping $\Φ$ reduces the continuum state to a finite SG, with explicit error bounds that scale with the approximation granularity and problem parameters, enabling minimax value iteration in two-player settings. Empirical results in Prisoner's Dilemma and related games show that strategic actors can increase their payoff by steering Q-learners, highlighting vulnerabilities and providing a foundation for designing more robust learning algorithms.
Abstract
In this paper, we explore the susceptibility of the independent Q-learning algorithms (a classical and widely used multi-agent reinforcement learning method) to strategic manipulation of sophisticated opponents in normal-form games played repeatedly. We quantify how much strategically sophisticated agents can exploit naive Q-learners if they know the opponents' Q-learning algorithm. To this end, we formulate the strategic actors' interactions as a stochastic game (whose state encompasses Q-function estimates of the Q-learners) as if the Q-learning algorithms are the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance for two competing strategic actors and a single strategic actor both analytically and numerically.
