Ranking with Long-Term Constraints
Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims
TL;DR
The paper tackles the problem of aligning long-term platform objectives with short-term ranking quality, formalizing a macro–micro control framework where long-horizon exposure/impact targets are enforced via per-step rankings. It introduces three controllers—Myopic (MC), Stationary (SC), and Predictive (PC)—grounded in a full-information linear-utility model with $u(a|x)$ and $c(a|x)$, and draws connections between SC and online convex optimization as well as P-control. The predictive controller (PC) extends this by forecasting progression-to-go $\widehat{C}_t^b$ across $B$ futures, enabling planning under non-stationary demand and improving long-term objectives on non-stationary data; empirical results across KuaiRec, Tv Audience, and synthetic data show that SC and PC generally outperform MC, with PC offering the best gains when temporal patterns are present. Overall, the work provides a principled toolkit for steering ranking platforms toward long-term welfare while preserving short-term engagement, highlighting trade-offs in planning, robustness, and computational cost.
Abstract
The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers, and other stakeholders. In this paper, we thus develop a new framework in which decision makers (e.g., platform operators, regulators, users) can express long-term goals for the behavior of the platform (e.g., fairness, revenue distribution, legal requirements). These goals take the form of exposure or impact targets that go well beyond individual sessions, and we provide new control-based algorithms to achieve these goals. In particular, the controllers are designed to achieve the stated long-term goals with minimum impact on short-term engagement. Beyond the principled theoretical derivation of the controllers, we evaluate the algorithms on both synthetic and real-world data. While all controllers perform well, we find that they provide interesting trade-offs in efficiency, robustness, and the ability to plan ahead.
