Learning from Streaming Data when Users Choose

Jinyan Su; Sarah Dean

Learning from Streaming Data when Users Choose

Jinyan Su, Sarah Dean

TL;DR

The paper studies learning in markets where streaming user data and user choice among services interact to create a non-stationary, feedback-driven distribution. It introduces Multi-learner Streaming Gradient Descent (MSGD), a decentralized algorithm that updates only the chosen model with the single-user loss while leveraging induced sub-populations for analysis. The authors prove that the overall loss $f(\Theta)$ converges almost surely and that the iterates converge to stationary points under standard stochastic optimization assumptions, and they validate the approach on Movielens-10M and census data, highlighting specialization versus global performance trade-offs. The results demonstrate that decentralized, streaming updates can effectively adapt to evolving user preferences and yield improved subpopulation performance, while highlighting the tension between specialization and universal coverage in markets with multiple competing providers.

Abstract

In digital markets comprised of many competing services, each user chooses between multiple service providers according to their preferences, and the chosen service makes use of the user data to incrementally improve its model. The service providers' models influence which service the user will choose at the next time step, and the user's choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss. Theoretically, we show that our algorithm asymptotically converges to stationary points of of the overall loss almost surely. We also experimentally demonstrate the utility of our algorithm with real world data.

Learning from Streaming Data when Users Choose

TL;DR

converges almost surely and that the iterates converge to stationary points under standard stochastic optimization assumptions, and they validate the approach on Movielens-10M and census data, highlighting specialization versus global performance trade-offs. The results demonstrate that decentralized, streaming updates can effectively adapt to evolving user preferences and yield improved subpopulation performance, while highlighting the tension between specialization and universal coverage in markets with multiple competing providers.

Abstract

Paper Structure (37 sections, 14 theorems, 101 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 37 sections, 14 theorems, 101 equations, 10 figures, 1 table, 2 algorithms.

Introduction
Related Work
Problem setting
User-Service Interaction Dynamics
Learning Objective
Multi-learner Streaming Gradient Descent
Properties of Learning Objective
Multi-Learner Streaming Gradient Descent
Gradient of learning objective $\nabla f(\Theta)$
Asymptotic Convergence Analysis
Convergence of $f(\Theta)$
Convergence of iterates $\Theta$
Experiments
Experimental Settings
Results
...and 22 more sections

Key Result

Lemma 4.3

For the learning objective $f(\Theta)$ defined in Eq. eq: 2, the gradient with respect to $\theta_i$ is:

Figures (10)

Figure 1: Example of $f_{\text{PR}}(\Theta)$ being non-convex.
Figure 2: Convergence of objective function $f(\Theta)$ under MSGD or Full Information with $k=3$ services in the movie recommendation (left) and census data (right) tasks.
Figure 3: Convergence of iterates $\Theta$ under MSGD or Full Information with $k=3$ services in the movie recommendation (left) and census data (right) tasks. For MSGD, we show results for $\zeta = 0, 0.2, 0.5, 0.8, 1$ respectively.
Figure 4: Accuracy of MSGD or Full Information on the model-specific subpopulation $\mathcal{D}_i(\Theta)$ (left) and whole population $\mathcal{P}$ (right) for the ACSEmployment task on census data with perfectly rational users ($\zeta = 0$). For MSGD, we illustrate results of different total number of services $k =2, 4,6$.
Figure 5: Accuracy of MSGD or Full Information in the census data (right) with fairly rational users and different total number of services $k$. The plot displays mean and standard deviation over three trials.
...and 5 more figures

Theorems & Definitions (35)

Definition 4.1
Remark 4.2
Lemma 4.3
proof : Proof Sketch
Remark 4.4
Definition 5.1
Theorem 5.2
Lemma 5.3
Lemma 5.4
Definition 1.1
...and 25 more

Learning from Streaming Data when Users Choose

TL;DR

Abstract

Learning from Streaming Data when Users Choose

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (35)