Learning from Streaming Data when Users Choose
Jinyan Su, Sarah Dean
TL;DR
The paper studies learning in markets where streaming user data and user choice among services interact to create a non-stationary, feedback-driven distribution. It introduces Multi-learner Streaming Gradient Descent (MSGD), a decentralized algorithm that updates only the chosen model with the single-user loss while leveraging induced sub-populations for analysis. The authors prove that the overall loss $f(\Theta)$ converges almost surely and that the iterates converge to stationary points under standard stochastic optimization assumptions, and they validate the approach on Movielens-10M and census data, highlighting specialization versus global performance trade-offs. The results demonstrate that decentralized, streaming updates can effectively adapt to evolving user preferences and yield improved subpopulation performance, while highlighting the tension between specialization and universal coverage in markets with multiple competing providers.
Abstract
In digital markets comprised of many competing services, each user chooses between multiple service providers according to their preferences, and the chosen service makes use of the user data to incrementally improve its model. The service providers' models influence which service the user will choose at the next time step, and the user's choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss. Theoretically, we show that our algorithm asymptotically converges to stationary points of of the overall loss almost surely. We also experimentally demonstrate the utility of our algorithm with real world data.
