Table of Contents
Fetching ...

Beyond Human Intervention: Algorithmic Collusion through Multi-Agent Learning Strategies

Suzie Grondin, Arthur Charpentier, Philipp Ratz

TL;DR

The paper investigates pricing with multi-agent reinforcement learning, showing that simple collusive outcomes hinge on symmetry and stationary policies. By reframing pricing as a multi-objective optimization and incorporating opponent modelling with function approximation, online-offline data buffers, and adaptive exploration, the authors demonstrate that robust, supra-competitive pricing can emerge under broader conditions. The work argues that regulatory concerns should focus on the reward structures and learning dynamics rather than explicit cartel-like agreements, highlighting the potential for rapid adaptation and nonstationarity in real markets. Overall, the findings underscore the need for policy that accounts for algorithmic pricing and the continual evolution of agent strategies in competitive environments.

Abstract

Collusion in market pricing is a concept associated with human actions to raise market prices through artificially limited supply. Recently, the idea of algorithmic collusion was put forward, where the human action in the pricing process is replaced by automated agents. Although experiments have shown that collusive market equilibria can be reached through such techniques, without the need for human intervention, many of the techniques developed remain susceptible to exploitation by other players, making them difficult to implement in practice. In this article, we explore a situation where an agent has a multi-objective strategy, and not only learns to unilaterally exploit market dynamics originating from other algorithmic agents, but also learns to model the behaviour of other agents directly. Our results show how common critiques about the viability of algorithmic collusion in real-life settings can be overcome through the usage of slightly more complex algorithms.

Beyond Human Intervention: Algorithmic Collusion through Multi-Agent Learning Strategies

TL;DR

The paper investigates pricing with multi-agent reinforcement learning, showing that simple collusive outcomes hinge on symmetry and stationary policies. By reframing pricing as a multi-objective optimization and incorporating opponent modelling with function approximation, online-offline data buffers, and adaptive exploration, the authors demonstrate that robust, supra-competitive pricing can emerge under broader conditions. The work argues that regulatory concerns should focus on the reward structures and learning dynamics rather than explicit cartel-like agreements, highlighting the potential for rapid adaptation and nonstationarity in real markets. Overall, the findings underscore the need for policy that accounts for algorithmic pricing and the continual evolution of agent strategies in competitive environments.

Abstract

Collusion in market pricing is a concept associated with human actions to raise market prices through artificially limited supply. Recently, the idea of algorithmic collusion was put forward, where the human action in the pricing process is replaced by automated agents. Although experiments have shown that collusive market equilibria can be reached through such techniques, without the need for human intervention, many of the techniques developed remain susceptible to exploitation by other players, making them difficult to implement in practice. In this article, we explore a situation where an agent has a multi-objective strategy, and not only learns to unilaterally exploit market dynamics originating from other algorithmic agents, but also learns to model the behaviour of other agents directly. Our results show how common critiques about the viability of algorithmic collusion in real-life settings can be overcome through the usage of slightly more complex algorithms.

Paper Structure

This paper contains 14 sections, 10 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Iterated Prisoners Dilemma using two RL agents with different exploration decay rates. Note that in the right plot the $x$-axis is longer than in the other two, as the agents needed more time to converge. In red is the expected payoff under the Nash equilibrium and in light green the expected payoff under the Pareto- optimal outcome.
  • Figure 2: Response function of two agents that have converged in their training. The lines depict the mean across the trained agents, the red shaded area shows where we manually overwrote the actions chosen by the policy of at least one agent.
  • Figure 3: Rewards after training with different exploration rates. Player I has a fixed exploration rate set as $\varepsilon=\exp(-t 10^{-5})$ and we vary the exploration rate of the second player. The dotted lines in green and red represent the symmetric collusion respective Nash profits.
  • Figure 4: Convergence speed in a symmetric market with the tabular approach and a function approximation approach until convergence is reached. The function approximator takes into account the last ten states which appears to increase the convergence speed.
  • Figure 5: Market outcomes when an online- offline agent is used under adversarial (competitive) and collusive scenario
  • ...and 1 more figures