PPA-Game: Characterizing and Learning Competitive Dynamics Among Online Content Creators
Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, Peng Cui
TL;DR
This work introduces the Proportional Payoff Allocation Game (PPA-Game) to model competition among $N$ online content creators over $K$ topics, where topic payoffs are shared proportionally according to creator-topic weights $w_{j,k}$. It analyzes Pure Nash Equilibria (PNE), proving existence under broad conditions and bounding the Price of Anarchy when PNE exist, while highlighting potential non-uniqueness and inefficiency of equilibria. Building on PPA-Game, the authors develop a decentralized Multi-player Multi-Armed Bandit (MPMAB) framework with an online learning algorithm that achieves a regret of $O(\log^{1+\eta} T)$ for any $\eta>0$, and validate performance through extensive synthetic experiments. The results offer a principled approach to understanding and guiding long-run competitive dynamics among content creators in recommender systems, with implications for stability and fairness in exposure distribution.
Abstract
In this paper, we present the Proportional Payoff Allocation Game (PPA-Game), which characterizes situations where agents compete for divisible resources. In the PPA-game, agents select from available resources, and their payoffs are proportionately determined based on heterogeneous weights attributed to them. Such dynamics simulate content creators on online recommender systems like YouTube and TikTok, who compete for finite consumer attention, with content exposure reliant on inherent and distinct quality. We first conduct a game-theoretical analysis of the PPA-Game. While the PPA-Game does not always guarantee the existence of a pure Nash equilibrium (PNE), we identify prevalent scenarios ensuring its existence. Simulated experiments further prove that the cases where PNE does not exist rarely happen. Beyond analyzing static payoffs, we further discuss the agents' online learning about resource payoffs by integrating a multi-player multi-armed bandit framework. We propose an online algorithm facilitating each agent's maximization of cumulative payoffs over $T$ rounds. Theoretically, we establish that the regret of any agent is bounded by $O(\log^{1 + η} T)$ for any $η> 0$. Empirical results further validate the effectiveness of our online learning approach.
