Table of Contents
Fetching ...

Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete?

Connor Douglas, Foster Provost, Arun Sundararajan

TL;DR

This work studies the emergent behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing, but they have no information about the strategic interaction they are engaged in and shows that these context-free bandits will consistently learn collusive behavior.

Abstract

Algorithmic agents are used in a variety of competitive decision settings, notably in making pricing decisions in contexts that range from online retail to residential home rentals. Business managers, algorithm designers, legal scholars, and regulators alike are all starting to consider the ramifications of "algorithmic collusion." We study the emergent behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing, but they have no information about the strategic interaction they are engaged in. Using a general-form repeated Prisoner's Dilemma game, agents engage in online learning with no prior model of game structure and no knowledge of competitors' states or actions (e.g., no observation of competing prices). We show that these context-free bandits, with no knowledge of opponents' choices or outcomes, still will consistently learn collusive behavior - what we call "naive collusion." We primarily study this system through an analytical model and examine perturbations to the model through simulations. Our findings have several notable implications for regulators. First, calls to limit algorithms from conditioning on competitors' prices are insufficient to prevent algorithmic collusion. This is a direct result of collusion arising even in the naive setting. Second, symmetry in algorithms can increase collusion potential. This highlights a new, simple mechanism for "hub-and-spoke" algorithmic collusion. A central distributor need not imbue its algorithm with supra-competitive tendencies for apparent collusion to arise; it can simply arise by using certain (common) machine learning algorithms. Finally, we highlight that collusive outcomes depend starkly on the specific algorithm being used, and we highlight market and algorithmic conditions under which it will be unknown a priori whether collusion occurs.

Naive Algorithmic Collusion: When Do Bandit Learners Cooperate and When Do They Compete?

TL;DR

This work studies the emergent behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing, but they have no information about the strategic interaction they are engaged in and shows that these context-free bandits will consistently learn collusive behavior.

Abstract

Algorithmic agents are used in a variety of competitive decision settings, notably in making pricing decisions in contexts that range from online retail to residential home rentals. Business managers, algorithm designers, legal scholars, and regulators alike are all starting to consider the ramifications of "algorithmic collusion." We study the emergent behavior of multi-armed bandit machine learning algorithms used in situations where agents are competing, but they have no information about the strategic interaction they are engaged in. Using a general-form repeated Prisoner's Dilemma game, agents engage in online learning with no prior model of game structure and no knowledge of competitors' states or actions (e.g., no observation of competing prices). We show that these context-free bandits, with no knowledge of opponents' choices or outcomes, still will consistently learn collusive behavior - what we call "naive collusion." We primarily study this system through an analytical model and examine perturbations to the model through simulations. Our findings have several notable implications for regulators. First, calls to limit algorithms from conditioning on competitors' prices are insufficient to prevent algorithmic collusion. This is a direct result of collusion arising even in the naive setting. Second, symmetry in algorithms can increase collusion potential. This highlights a new, simple mechanism for "hub-and-spoke" algorithmic collusion. A central distributor need not imbue its algorithm with supra-competitive tendencies for apparent collusion to arise; it can simply arise by using certain (common) machine learning algorithms. Finally, we highlight that collusive outcomes depend starkly on the specific algorithm being used, and we highlight market and algorithmic conditions under which it will be unknown a priori whether collusion occurs.

Paper Structure

This paper contains 30 sections, 7 theorems, 16 equations, 5 figures, 2 tables.

Key Result

Lemma 1

With path-invariant bandits, $s_t$ adheres to the Markov property.

Figures (5)

  • Figure 1: Value estimates for each agent's action across a 10,000 round Prisoner's Dilemma with epsilon-greedy agents, for two sets of game parameters and epsilon values. In both cases, the agents ultimately learn to compete (i.e., that playing L has higher expected payoff).
  • Figure 2: Value estimates for each agent's action across a 10,000 round Prisoner's Dilemma with UCB agents, for two sets of game parameters and delta values. In both cases, the agents ultimately learn to collude (i.e., that playing H has higher expected payoff).
  • Figure 3: Value estimates for each agent's action across a 10,000 round Prisoner's Dilemma with epsilon-decay agents
  • Figure 4: Proportion of games with epsilon-decay agents ending in a collusive equilibrium
  • Figure 5: Proportion of games with asymmetric UCB agents ending in a collusive equilibrium

Theorems & Definitions (18)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • proof
  • Proposition 1
  • proof
  • ...and 8 more