Algorithmic pricing with independent learners and relative experience replay
Bingyan Han
TL;DR
This work investigates algorithmic collusion risks in infinitely repeated pricing with independent RL agents. It introduces relative experience replay (relative ER) by embedding agents' relative performance into replay sampling through an RP coefficient $\lambda$, which modulates the likelihood of revisiting transitions where an agent underperforms. Empirical results show that negative RP ($\lambda<0$) drives convergence to Bertrand-Nash pricing, while positive RP ($\lambda>0$) supports supra-competitive, monopoly-like pricing and can mitigate overfitting; the effects persist across noise, asymmetry, and different agent counts, and extend to deep Q-learning where convergence speeds are enhanced by positive RP. The findings highlight RP as a controllable factor shaping learning dynamics in general-sum pricing games, with implications for antitrust policy and the design of robust multi-agent RL systems, while outlining important open questions about sparse rewards and convergence guarantees.
Abstract
In an infinitely repeated general-sum pricing game, independent reinforcement learners may exhibit collusive behavior without any communication, raising concerns about algorithmic collusion. To better understand the learning dynamics, we incorporate agents' relative performance (RP) among competitors using experience replay (ER) techniques. Experimental results indicate that RP considerations play a critical role in long-run outcomes. Agents that are averse to underperformance converge to the Bertrand-Nash equilibrium, while those more tolerant of underperformance tend to charge supra-competitive prices. This finding also helps mitigate the overfitting issue in independent Q-learning. Additionally, the impact of relative ER varies with the number of agents and the choice of algorithms.
