Self-Interested Agents in Collaborative Machine Learning: An Incentivized Adaptive Data-Centric Framework
Nithia Vijayan, Bryan Kian Hsiang Low
TL;DR
This work tackles online batch data-centric collaborative learning with self-interested agents coordinated by an arbiter. It proposes a bilevel framework in which agents learn stochastic data-sharing policies $\\pi^i_{\\nu}$ to influence data contributions, while the arbiter jointly optimizes model parameters $\\theta$ and agent weights $\\omega$ via $h_\\omega(i)$ and distortion-based incentives, producing per-agent models $\\theta^i=\\theta+\\eta^i$. The authors establish non-asymptotic convergence guarantees: agent updates converge to approximate stationary points of $G^i(\\nu)$ at rate $\\mathcal{O}(1/\\sqrt{T^i})$, and arbiter updates converge to an approximate stationary point of the expected loss $J(\\theta)$ at rate $\\mathcal{O}(T^{-2/5})$, under standard smoothness and boundedness assumptions. The framework enables adaptive, incentive-compatible collaboration across heterogeneous data sources and can be extended with differential privacy to enhance practicality and trust in real-world deployments.
Abstract
We propose a framework for adaptive data-centric collaborative machine learning among self-interested agents, coordinated by an arbiter. Designed to handle the incremental nature of real-world data, the framework operates in an online manner: at each time step, the arbiter collects a batch of data from agents, trains a machine learning model, and provides each agent with a distinct model reflecting its data contributions. This setup establishes a feedback loop where shared data influence model updates, and the resulting models guide future data-sharing policies. Agents evaluate and partition their data, selecting a partition to share using a stochastic parameterized policy, learned via policy gradient methods to optimize the utility of the received model as defined by agent-specific evaluation functions. On the arbiter side, the expected loss function over the true data distribution is optimized, incorporating agent-specific weights to account for distributional differences arising from diverse sources and selective sharing. A bilevel optimization algorithm jointly learns the model parameters and agent-specific weights. Mean-zero noise, computed using a distortion function that adjusts these agent-specific weights, is introduced to generate distinct agent-specific models, promoting valuable data sharing without requiring separate training. Our framework is underpinned by non-asymptotic analyses, ensuring convergence of the agent-side policy optimization to an approximate stationary point of the evaluation functions and convergence of the arbiter-side optimization to an approximate stationary point of the expected loss function.
