User Strategization and Trustworthy Algorithms
Sarah H. Cen, Andrew Ilyas, Aleksander Madry
TL;DR
This paper investigates the mismatch between data-driven platforms and human users who can strategically respond to platform prompts. By modeling the interaction as a repeated Stackelberg-style game with Bayesian belief updates, it shows that user strategization can improve short-run platform payoffs but distorts data and undermines counterfactual inference, revealing a tension between adaptivity and exogenous-data assumptions. It then formalizes a notion of κ-trustworthy algorithms that both discourage strategization and guarantee a minimum user payoff, and proposes practical interventions—offering multiple algorithms and feedback mechanisms—to enhance trust and data reliability. The results highlight that trustworthy design can align user and platform incentives, improving long-run payoffs and the quality of learned models, while also clarifying why naive trust-boosting approaches may fall short. Overall, the work connects trust, data exogeneity, and counterfactual reasoning, offering a formal framework and interventions for designing more robust, user-aligned data-driven systems.
Abstract
Many human-facing algorithms -- including those that power recommender systems or hiring decision tools -- are trained on data provided by their users. The developers of these algorithms commonly adopt the assumption that the data generating process is exogenous: that is, how a user reacts to a given prompt (e.g., a recommendation or hiring suggestion) depends on the prompt and not on the algorithm that generated it. For example, the assumption that a person's behavior follows a ground-truth distribution is an exogeneity assumption. In practice, when algorithms interact with humans, this assumption rarely holds because users can be strategic. Recent studies document, for example, TikTok users changing their scrolling behavior after learning that TikTok uses it to curate their feed, and Uber drivers changing how they accept and cancel rides in response to changes in Uber's algorithm. Our work studies the implications of this strategic behavior by modeling the interactions between a user and their data-driven platform as a repeated, two-player game. We first find that user strategization can actually help platforms in the short term. We then show that it corrupts platforms' data and ultimately hurts their ability to make counterfactual decisions. We connect this phenomenon to user trust, and show that designing trustworthy algorithms can go hand in hand with accurate estimation. Finally, we provide a formalization of trustworthiness that inspires potential interventions.
