Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation

Yuhong Luo; Daniel Schoepflin; Xintong Wang

Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation

Yuhong Luo, Daniel Schoepflin, Xintong Wang

TL;DR

Whether collusion can emerge under rational choices and how agents co-adapt toward cooperation or competition is examined and findings on the feasibility of algorithmic collusion and the effectiveness of pricing strategies in practical ``test-time''environments are presented.

Abstract

The threat of algorithmic collusion, and whether it merits regulatory intervention, remains debated, as existing evaluations of its emergence often rely on long learning horizons, assumptions about counterparty rationality in adopting collusive strategies, and symmetry in hyperparameters and economic settings among players. To study collusion risk, we introduce a meta-game design for analyzing algorithmic behavior under test-time constraints. We model agents as possessing pretrained policies with distinct strategic characteristics (e.g., competitive, naively cooperative, robustly collusive), and formulate the problem as selecting a meta-strategy that combines a pretrained, initial policy with an in-game adaptation rule. We seek to examine whether collusion can emerge under rational choices and how agents co-adapt toward cooperation or competition. To this end, we sample normal-form empirical games over meta-strategy profiles, % across random initial game states, compute relevant game statistics (e.g., payoffs against individuals and regret against an equilibrium mixture of opponents), and construct empirical best-response graphs to uncover strategic relationships. We evaluate both reinforcement-learning and LLM-based strategies in repeated pricing games under symmetric and asymmetric cost settings, and present findings on the feasibility of algorithmic collusion and the effectiveness of pricing strategies in practical ``test-time'' environments. The source code and the full paper with appendix are available at: https://github.com/chailab-rutgers/CollusionMetagame.

Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation

TL;DR

Abstract

Paper Structure (55 sections, 11 equations, 25 figures, 15 tables, 1 algorithm)

This paper contains 55 sections, 11 equations, 25 figures, 15 tables, 1 algorithm.

Introduction
Our contribution.
Related Work
Algorithmic collusion.
Strategy selection, adaptation, and evaluation.
Preliminaries
Meta-game Design for Algorithmic Collusion
Initial Policy, Strategy, and Meta-strategy
Initial Policy
Strategy
Meta-strategy
Empirical Game-theoretic Analysis
A Meta-game Evaluation Framework
Game-analysis statistics.
Evaluating Algorithmic Collusion in Repeated Pricing Games
...and 40 more sections

Figures (25)

Figure 1: A toy meta-game for repeated Prisoner's Dilemma on canonical strategies. The NE of this meta-game leads to cooperation.See Example \ref{['exp:toy']} for more discussion.
Figure 2: Running averages of CoI over 100 rounds (Left) and accumulated policy update counts for strategy pairs over 10,000 rounds (Right). Shaded regions indicate 95% confidence intervals. Each curve represents the mean over 20 strategy pairs and 100 random initial states. INIT denotes Q-learning agents trained from scratch, with Q-values initialized to those corresponding to opponents playing uniformly random pricing strategies. All strategies except INIT and pretrained pairs use $\alpha = 0.5$ and $f = 1$.
Figure 3: The best-response graph for Q-learning with $c_1=c_2=1$, evaluated at $t=10,000$. The edge weights correspond to the sum of best-response scores across all meta-games as discussed in Sec. \ref{['sec:metric']}.
Figure 4: We collect 12 pairs of pretrained Q-learning policies with 15 discretized actions and use them as initial policies for test-time evaluation. We measure total payoffs from $t=0$ to $t=100$, averaged over 100 random seeds. In green, each policy is paired with its original pretraining pair. In red, each pretrained policy ($\pi_j$) is paired with its best response ($\pi_{-j}$). In orange and blue, we show payoffs for randomly sampled, independently pretrained pairs without adaptation: orange points lie closer (in L2 distance) to the red cluster, while blue points are closer to the green.
Figure 5: With 4 discretized actions, the 500 Q-learning pretrained policies form three distinct clusters when laid out along PC (Def. \ref{['def:PC']}) and CR (Def. \ref{['def:CR']}), corresponding to the competitive (LC), naively collusive (C), and robustly collusive (RC) categories introduced in Sec. \ref{['exp:meta']}.
...and 20 more figures

Theorems & Definitions (11)

Definition 2.1: Collusion Index (CoI)
Definition 2.2: State-Value Function
Definition 3.1: Paired Cooperativeness (PC)
Definition 3.2: Cooperative Robustness (CR)
Example 3.1: A toy meta-game for repeated Prisoner's Dilemma
Definition 3.3: NE-Regret
Definition 4.1: Logit Demand Model
Definition 4.2: Q-decoding Rule $\phi(Z_{t})$
Definition 4.3: Q-learning Update Rule $\omega(Z_{t}, S_t)$
Definition 4.4: UCB Decoding Rule $\phi(Z_t)$
...and 1 more

Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation

TL;DR

Abstract

Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (25)

Theorems & Definitions (11)