Table of Contents
Fetching ...

Learning Collusion in Episodic, Inventory-Constrained Markets

Paul Friedrich, Barna Pásztor, Giorgia Ramponi

TL;DR

This work addresses tacit collusion in pricing by modeling episodic, inventory-constrained markets (e.g., airline revenue management) as finite-horizon Markov games. It introduces a formal framework, a collusion measure based on Nash and monopolistic reference prices, and a numerical method to compute these equilibria when closed-form solutions do not exist. Through experiments with PPO and DQN agents, the study shows that learned strategies can converge to collusive pricing under realistic constraints, with PPO achieving higher levels of collusion than DQN. The results demonstrate the potential for RL-driven pricing to exhibit collusion in practical markets and motivate development of regulatory and mitigation strategies. All mathematical notation is presented with explicit delimiters in $...$ as appropriate.

Abstract

Pricing algorithms have demonstrated the capability to learn tacit collusion that is largely unaddressed by current regulations. Their increasing use in markets, including oligopolistic industries with a history of collusion, calls for closer examination by competition authorities. In this paper, we extend the study of tacit collusion in learning algorithms from basic pricing games to more complex markets characterized by perishable goods with fixed supply and sell-by dates, such as airline tickets, perishables, and hotel rooms. We formalize collusion within this framework and introduce a metric based on price levels under both the competitive (Nash) equilibrium and collusive (monopolistic) optimum. Since no analytical expressions for these price levels exist, we propose an efficient computational approach to derive them. Through experiments, we demonstrate that deep reinforcement learning agents can learn to collude in this more complex domain. Additionally, we analyze the underlying mechanisms and structures of the collusive strategies these agents adopt.

Learning Collusion in Episodic, Inventory-Constrained Markets

TL;DR

This work addresses tacit collusion in pricing by modeling episodic, inventory-constrained markets (e.g., airline revenue management) as finite-horizon Markov games. It introduces a formal framework, a collusion measure based on Nash and monopolistic reference prices, and a numerical method to compute these equilibria when closed-form solutions do not exist. Through experiments with PPO and DQN agents, the study shows that learned strategies can converge to collusive pricing under realistic constraints, with PPO achieving higher levels of collusion than DQN. The results demonstrate the potential for RL-driven pricing to exhibit collusion in practical markets and motivate development of regulatory and mitigation strategies. All mathematical notation is presented with explicit delimiters in as appropriate.

Abstract

Pricing algorithms have demonstrated the capability to learn tacit collusion that is largely unaddressed by current regulations. Their increasing use in markets, including oligopolistic industries with a history of collusion, calls for closer examination by competition authorities. In this paper, we extend the study of tacit collusion in learning algorithms from basic pricing games to more complex markets characterized by perishable goods with fixed supply and sell-by dates, such as airline tickets, perishables, and hotel rooms. We formalize collusion within this framework and introduce a metric based on price levels under both the competitive (Nash) equilibrium and collusive (monopolistic) optimum. Since no analytical expressions for these price levels exist, we propose an efficient computational approach to derive them. Through experiments, we demonstrate that deep reinforcement learning agents can learn to collude in this more complex domain. Additionally, we analyze the underlying mechanisms and structures of the collusive strategies these agents adopt.

Paper Structure

This paper contains 44 sections, 1 theorem, 15 equations, 18 figures, 3 tables.

Key Result

Lemma 5.2

Given a Markov Game with deterministic transitions, let $p^\ast = (p_1^{\ast},\ldots,p_n^{\ast})$ be the solution to eq:gnep and define $\pi^\ast = (\pi_1^\ast, \ldots, \pi_n^\ast)$, as $\pi_i^\ast(s_t) = p^{\ast}_{i,t}$ for all $i$, $t$, and $s_t \in \mathcal{S}$. Then $\pi^\ast$ is a Nash equilibr

Figures (18)

  • Figure 1: One-period equilibrium price levels as a function of inventory capacity for two equally constrained agents.
  • Figure 2: Evolution of training two DQN and two PPO agents in our model, showing average agent actions per episode (DQNs top left, PPOs top right) and collusion index (bottom) with collusive and competitive actions indicated with the green upper and red lower dashed lines respectively. In DQN's training plot, the dotted lines are the greedy actions that DQN would have chosen. Both DQN and PPO first converge to competition before gradually rising toward collusion.
  • Figure 3: Behavior of two DQN agents during an episode after forcing one agent to deviate at time $t=1$ and $t=9$ respectively. Dotted lines indicate evolution without deviation. Deviations provoke a competitive reaction, with both agents quickly returning to collusion.
  • Figure 4: The surfaces show a DQN agent 1's learned best response under their greedy policy (i.e., the action with the highest Q-value) to a state given by both agent's prices (x- and y-axes), timestep and symmetric remaining inventory level.
  • Figure 5: Convergence and collusion metrics for DQN and PPO training runs with varied learning rate. Collusion is robust against varying (yet sufficiently large) learning rate.
  • ...and 13 more figures

Theorems & Definitions (6)

  • Definition 3.1: Competitive & collusive solutions
  • Definition 3.2: Collusion measure
  • Definition 5.1
  • Lemma 5.2
  • proof
  • Definition D.1