Table of Contents
Fetching ...

Strategic Usage in a Multi-Learner Setting

Eliot Shekhtman, Sarah Dean

TL;DR

This work focuses on realizable settings, and shows that naive retraining can still lead to oscillation even if all users are observed at different times; however, if this retraining uses memory of past observations, convergent behavior can be guaranteed for certain loss function classes.

Abstract

Real-world systems often involve some pool of users choosing between a set of services. With the increase in popularity of online learning algorithms, these services can now self-optimize, leveraging data collected on users to maximize some reward such as service quality. On the flipside, users may strategically choose which services to use in order to pursue their own reward functions, in the process wielding power over which services can see and use their data. Extensive prior research has been conducted on the effects of strategic users in single-service settings, with strategic behavior manifesting in the manipulation of observable features to achieve a desired classification; however, this can often be costly or unattainable for users and fails to capture the full behavior of multi-service dynamic systems. As such, we analyze a setting in which strategic users choose among several available services in order to pursue positive classifications, while services seek to minimize loss functions on their observations. We focus our analysis on realizable settings, and show that naive retraining can still lead to oscillation even if all users are observed at different times; however, if this retraining uses memory of past observations, convergent behavior can be guaranteed for certain loss function classes. We provide results obtained from synthetic and real-world data to empirically validate our theoretical findings.

Strategic Usage in a Multi-Learner Setting

TL;DR

This work focuses on realizable settings, and shows that naive retraining can still lead to oscillation even if all users are observed at different times; however, if this retraining uses memory of past observations, convergent behavior can be guaranteed for certain loss function classes.

Abstract

Real-world systems often involve some pool of users choosing between a set of services. With the increase in popularity of online learning algorithms, these services can now self-optimize, leveraging data collected on users to maximize some reward such as service quality. On the flipside, users may strategically choose which services to use in order to pursue their own reward functions, in the process wielding power over which services can see and use their data. Extensive prior research has been conducted on the effects of strategic users in single-service settings, with strategic behavior manifesting in the manipulation of observable features to achieve a desired classification; however, this can often be costly or unattainable for users and fails to capture the full behavior of multi-service dynamic systems. As such, we analyze a setting in which strategic users choose among several available services in order to pursue positive classifications, while services seek to minimize loss functions on their observations. We focus our analysis on realizable settings, and show that naive retraining can still lead to oscillation even if all users are observed at different times; however, if this retraining uses memory of past observations, convergent behavior can be guaranteed for certain loss function classes. We provide results obtained from synthetic and real-world data to empirically validate our theoretical findings.
Paper Structure (25 sections, 15 theorems, 9 equations, 9 figures)

This paper contains 25 sections, 15 theorems, 9 equations, 9 figures.

Key Result

Proposition 0

In the memoryless $p=0$ setting, there exist settings in which the state $(H, A)$ never converges.

Figures (9)

  • Figure 1: Five datapoint example described in the proof of Proposition \ref{['prop:nomem-oscil-ex']}. Negative points are represented by $-$ and positive points by $+$, where boldface indicates two overlapping points. The dashed and solid lines represent the oscillating classifiers, with the dotted line representing a zero-loss classifier.
  • Figure 2: 5-Points dataset; the top three graphs give the $p=0$ case while the bottom three give $p=0.5$. Service loss is calculated after the user update but before the service update, and usages are displayed for each of the five points with the middle graphs giving the usages for model $j=0$ and the right graphs giving the usages for model $j=1$.
  • Figure 3: Banknote Authentication dataset; each graph gives the positive and negative usages of each of the five models; triangle markers above the lines indicate positive usage while below the lines indicate negative, with colors giving which model the line refers to. The left graph gives the no-memory $p=0$ setting, while the graph on the right gives the $p>0$ setting. Model order, and hence their colors, are meaningless due to the random initialization.
  • Figure 4: Bank Account Fraud dataset; each graph gives the positive and negative usages of each of the five models. Notation is the same as that for Figure \ref{['fig:banknote_usages']}.
  • Figure 5: 5-Points dataset; the top three graphs give the $p=0.1$ case while the bottom three give $p=1.0$.
  • ...and 4 more figures

Theorems & Definitions (23)

  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Definition 1
  • Definition 2
  • Proposition 0
  • Lemma 0
  • Proposition 0
  • Proposition 0
  • ...and 13 more