Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion

Martin Bichler; Julius Durmann; Matthias Oberlechner

Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion

Martin Bichler, Julius Durmann, Matthias Oberlechner

TL;DR

The paper investigates whether online pricing algorithms used in repeated Bertrand competition converge to competitive Nash outcomes or enable tacit collusion. It establishes that mean-based bandit algorithms converge to the correlated rationalizable set, which coincides with Nash equilibria in important Bertrand settings with all-or-nothing or linear demand, yielding last-iterate convergence to NE under certain conditions. Complementary experiments show that algorithmic collusion is rare in practice, occurring mainly in symmetric installations of UCB or Q-learning with few firms, and it weakens as the number of competitors increases or when heterogeneous algorithms interact. The results imply that the risk of widespread algorithm-driven collusion may be overstated in realistic pricing environments and provide regulators and managers with nuanced guidance on monitoring and tool deployment.

Abstract

This paper examines whether widely used online learning algorithms in pricing can independently reach competitive outcomes or instead foster tacit collusion. This issue has drawn considerable attention from competition regulators as algorithmic pricing becomes more common in digital markets. Understanding when such algorithms lead to equilibrium prices or to supra-competitive prices is critical for buyers, sellers, and policymakers. We study the behavior of multi-armed bandit algorithms in repeated price competition. These algorithms only observe profits from the chosen prices, making them realistic models of automated pricing. Our formal analysis shows that an important class of online learning algorithms, called mean-based algorithms, reliably converges to Nash equilibrium in Bertrand competition. This finding is notable because, generally, online learning algorithms do not guarantee convergence. We also run extensive numerical experiments with different bandit algorithms, confirming that most widely used algorithms, including those not mean-based, converge to equilibrium. We observe supra-competitive prices only in specific cases where all sellers implement the same symmetric version of certain algorithms, such as UCB or Q-learning, and this effect diminishes as the number of competitors increases. Our results highlight that the risk of algorithmic collusion in competitive markets is often overstated. For most practical implementations of bandit algorithms, sellers' prices converge to competitive levels. Only under very specific and symmetric setups do prices remain above competitive benchmarks, and this effect diminishes with more competitors. These insights support regulators concerned with consumer welfare and managers considering algorithmic pricing tools. They suggest that while vigilance is warranted, fears of widespread algorithm-driven collusion may be exaggerated.

Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion

TL;DR

Abstract

Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (58)