User Strategization and Trustworthy Algorithms

Sarah H. Cen; Andrew Ilyas; Aleksander Madry

User Strategization and Trustworthy Algorithms

Sarah H. Cen, Andrew Ilyas, Aleksander Madry

TL;DR

This paper investigates the mismatch between data-driven platforms and human users who can strategically respond to platform prompts. By modeling the interaction as a repeated Stackelberg-style game with Bayesian belief updates, it shows that user strategization can improve short-run platform payoffs but distorts data and undermines counterfactual inference, revealing a tension between adaptivity and exogenous-data assumptions. It then formalizes a notion of κ-trustworthy algorithms that both discourage strategization and guarantee a minimum user payoff, and proposes practical interventions—offering multiple algorithms and feedback mechanisms—to enhance trust and data reliability. The results highlight that trustworthy design can align user and platform incentives, improving long-run payoffs and the quality of learned models, while also clarifying why naive trust-boosting approaches may fall short. Overall, the work connects trust, data exogeneity, and counterfactual reasoning, offering a formal framework and interventions for designing more robust, user-aligned data-driven systems.

Abstract

Many human-facing algorithms -- including those that power recommender systems or hiring decision tools -- are trained on data provided by their users. The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e.g., a recommendation or hiring suggestion) depends on the prompt and not on the algorithm that generated it. For example, the assumption that a person's behavior follows a ground-truth distribution is an exogeneity assumption. In practice, when algorithms interact with humans, this assumption rarely holds because users can be strategic. Recent studies document, for example, TikTok users changing their scrolling behavior after learning that TikTok uses it to curate their feed, and Uber drivers changing how they accept and cancel rides in response to changes in Uber's algorithm. Our work studies the implications of this strategic behavior by modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We first find that user strategization can actually help platforms in the short term. We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions. We connect this phenomenon to user trust, and show that designing trustworthy algorithms can go hand in hand with accurate estimation. Finally, we provide a formalization of trustworthiness that inspires potential interventions.

User Strategization and Trustworthy Algorithms

TL;DR

Abstract

Paper Structure (79 sections, 24 theorems, 113 equations, 5 figures, 1 table)

This paper contains 79 sections, 24 theorems, 113 equations, 5 figures, 1 table.

Introduction
Contributions.
Summary of contributions
A game-theoretic model of data-driven algorithms.
Users strategize in order to optimize their long-term outcomes.
Main results: User strategization can both help and hurt the platform.
Trustworthy design can mitigate user strategization.
Related work
Model
Setup
Generating propositions.
Bayesian updating.
Committing to strategies.
Examples
User strategization
...and 64 more sections

Key Result

Proposition 5.0

Let $\widehat{\mathcal{Q}}$ be the hypothesis class defined by eq:parameters. Consider a user strategy $q$ and a platform algorithm $p$ such that $p({\cdot};{\mu})$ has full support for all $\mu \in \Delta(\widehat{\mathcal{Q}})$. Let $\text{supp}(q) = \{Z \in \mathcal{Z}: q(B=1|Z) > 0\}$. If $|\tex In other words, the platform's limiting belief is $\mu_\infty = \delta_{\hat{q}_{i^*}}$.

Figures (5)

Figure 1: Illustration of the setup described in Section \ref{['sec:model']}. (Left) At each time step $t$, the platform issues propositions $Z_t$, and the user responds with behaviors $B_t$. The user's actions are determined by their strategy $q: \mathcal{Z} \rightarrow \Delta(\mathcal{B})$. The platform's are determined by the algorithm $p$, the hypothesis class $\widehat{\mathcal{Q}}$, and the platform's belief $\mu_t$ over the hypothesis class at time $t$. (Right) The platform's actions at time $t$ depend on its belief $\mu_t$. Here, $\mu_t$ is a distribution (i.e., set of weights) over $\widehat{\mathcal{Q}}$ such that $\mu_t ( \hat{q}_i )$ denotes the probability that the platform assigns to the user model $q = \hat{q}_i$ at time $t$.
Figure 2: Convergence of platform beliefs about the user as $t \rightarrow \infty$. (Left) Suppose the user adopts strategy $q$, and the platform begins with an initial belief ${\mu_0}$. For illustrative purposes, we visualize the platform's initial belief using the corresponding estimate ${\hat{q}^0}$, and we use the orange polygon to represent $\text{ConvexHull}(\widehat{\mathcal{Q}})$. (In this figure, superscripts on $q$ represent time steps, and subscripts index hypotheses/models in $\widehat{\mathcal{Q}}$.) As the platform collects data, its estimate evolves, eventually converging. The beliefs to which the platform converges is given by the globally stable set, as defined in \ref{['def:stable']}; in this figure, we visualize the globally stable set $\widehat{\mathcal{Q}}_\infty \subset \widehat{\mathcal{Q}}$ as a singleton set $\widehat{\mathcal{Q}}_\infty = \{\hat{q}^\infty\}$ such that the platform's limiting belief under $(q, p, \widehat{\mathcal{Q}})$ is the point-mass belief $\mu_\infty = \delta_{\hat{q}^\infty}$. (Right) As formalized in \ref{['def:stable']}, the belief to which the platform converges depends on the platform's strategy ${(p, \widehat{\mathcal{Q}})}$ and the user's strategy $q$. We illustrate this dependence by visualizing how changing the platform's hypothesis class (from $\widehat{\mathcal{Q}}$ to $\widehat{\mathcal{Q}}'$) affects the platform's limiting belief (from $\delta_{\hat{q}^\infty}$ to $\delta_{\hat{q}'^{,\infty}}$).
Figure 3: Illustration of a naive user (Section \ref{['sec:BR_user']}) and a strategic user (Section \ref{['sec:strat_user']}). (Left) The (convex hull of the) platform's hypothesis class $\widehat{\mathcal{Q}}$ is given by the orange polygon. The naive user's strategy $q^\text{BR}$ is given by the solid green dot. As in Figure \ref{['fig:stable']}, the platform's estimate of $q^\text{BR}$ evolves as $t \rightarrow \infty$; we visualize the limiting estimate as $\hat{q}_i = \hat{q}^{\text{BR}, \infty}$. (Right) The strategic user considers their payoff under the platform's limiting estimate, i.e., ${ { \macc@depth1 \frozen@everymath{\mathgroup\macc@group} \macc@set@skewchar \macc@nested@a111{} } (p^{\delta_{\hat{q}^{\text{BR}, \infty}}}, q^\text{BR})}$ and finds that they can instead adopt the strategy $q^*(p, \widehat{\mathcal{Q}})$ that leads the platform to a belief (and in turn, a proposition distribution) that is more favorable for the user, i.e., ${ { \macc@depth1 \frozen@everymath{\mathgroup\macc@group} \macc@set@skewchar \macc@nested@a111{} } (p^{\delta_{\hat{q}^{*, \infty}}}, q^*(p, \widehat{\mathcal{Q}}))} > { { \macc@depth1 \frozen@everymath{\mathgroup\macc@group} \macc@set@skewchar \macc@nested@a111{} } (p^{\delta_{\hat{q}^{\text{BR}, \infty}}}, q^\text{BR})}$.
Figure 4: The recommender system that we consider in our stylized example (Section \ref{['sec:example']}). The platform's hypothesis class consists of three user models. Under one model, the user watches exclusively horror movies; under the other, exclusively comedies; and under the last model, the user is equally interested in comedy and horror. The platform represents the user as a convex combination of these models, which dictates the recommendations that the platform gives. For example, as shown at the bottom of the figure, if the platform believes the user is of type 1, then the platform shows the user horror movies with 99% probability.
Figure 5: Naive and strategic user strategies in our stylized example (Section \ref{['sec:example']}). We consider a user whose affinity $a(Z)$ is encoded by items' opacity in the Figure above. A naive strategy for this user (left) would click on item $Z$ if and only if $a(Z) = 1$. This strategy would result in the platform modeling the user as the "clicks on anything" user $\hat{q}_3$ (see \ref{['fig:illustrative_setup']}), and thus serve a feed that is 50% comedy and 50% horror. If the user is strategic (right), they recognize that the naive strategy is suboptimal, and they avoid clicking on "outlier" comedy videos that they enjoy. The platform thus estimates the user as a "clicks only on horror" user $\hat{q}_1$, and serves a feed that better suits the user. Notably, both user and platform payoffs are higher when the user is strategic.

Theorems & Definitions (58)

Example 3.1: Hiring example
Example 3.2: Uber example
Definition 4.1: Naive user
Definition 4.2: Globally stable set
Definition 4.3: Expected payoffs
Definition 4.4: Strategic user
Proposition 5.0
proof
Proposition 5.0
proof
...and 48 more

User Strategization and Trustworthy Algorithms

TL;DR

Abstract

User Strategization and Trustworthy Algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (58)