Statistical Collusion by Collectives on Learning Platforms
Etienne Gauthier, Francis Bach, Michael I. Jordan
TL;DR
This paper studies how collectives can statistically influence learning platforms by coordinated data modifications, formalizing a model with collective size $\alpha = n/N$ and a data-transforming strategy $h$. It introduces a statistical-inference framework that enables finite-sample computation of strategy-optimization bounds for three goals: signal planting, signal unplanting, and signal erasing, with two planting variants and adaptive-unplanting approaches. The authors derive computable, finite-sample lower bounds that reveal staircase-like behavior across a finite signal set and show that absolute collective size matters alongside relative size; they validate the theory on a synthetic product-evaluation domain and compare with prior infinite-data bounds, finding tighter guarantees. The work contributes practical tools to anticipate a collective's impact on platform behavior, guiding design of more robust and transparent learning systems in the presence of strategic data-injection, and points to future directions in concentration inequalities, regression settings, and population heterogeneity.
Abstract
As platforms increasingly rely on learning algorithms, collectives may form and seek ways to influence these platforms to align with their own interests. This can be achieved by coordinated submission of altered data. To evaluate the potential impact of such behavior, it is essential to understand the computations that collectives must perform to impact platforms in this way. In particular, collectives need to make a priori assessments of the effect of the collective before taking action, as they may face potential risks when modifying their data. Moreover they need to develop implementable coordination algorithms based on quantities that can be inferred from observed data. We develop a framework that provides a theoretical and algorithmic treatment of these issues and present experimental results in a product evaluation domain.
