Replication-proof Bandit Mechanism Design with Bayesian Agents
Suho Shin, Seyed A. Esmaeili, MohammadTaghi Hajiaghayi
TL;DR
This work addresses replication-proof mechanism design in Bayesian bandit settings where agents know only the prior distribution of their arms' means. It introduces random-permutation regret, along with truthfulness under random permutation (TRP) and permutation invariance (PI), as core criteria for replication-proofness, and shows that an exploration-then-commit (ETC) approach is replication-proof in the single-agent case. Building on this, the authors design a hierarchical ETC-based multi-agent algorithm (H-ETC-R) with a restarting round to maintain replication-proofness while achieving sublinear regret. The results generalize Shin et al. (2022) to Bayesian agents and offer a practical, incentive-compatible algorithm with regret that scales sublinearly in the time horizon and number of arms, with implications for replication control in real-world platforms.
Abstract
We study the problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms to maximize their payoff. Specifically, we consider Bayesian agents who only know the distribution from which their own arms' mean rewards are sampled, unlike the original setting of by Shin et al. 2022. Interestingly, with Bayesian agents in stark contrast to the previous work, analyzing the replication-proofness of an algorithm becomes significantly complicated even in a single-agent setting. We provide sufficient and necessary conditions for an algorithm to be replication-proof in the single-agent setting, and present an algorithm that satisfies these properties. These results center around several analytical theorems that focus on \emph{comparing the expected regret of multiple bandit instances}, and therefore might be of independent interest since they have not been studied before to the best of our knowledge. We expand this result to the multi-agent setting, and provide a replication-proof algorithm for any problem instance. We finalize our result by proving its sublinear regret upper bound which matches that of Shin et al. 2022.
