Table of Contents
Fetching ...

Ranking with Long-Term Constraints

Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims

TL;DR

The paper tackles the problem of aligning long-term platform objectives with short-term ranking quality, formalizing a macro–micro control framework where long-horizon exposure/impact targets are enforced via per-step rankings. It introduces three controllers—Myopic (MC), Stationary (SC), and Predictive (PC)—grounded in a full-information linear-utility model with $u(a|x)$ and $c(a|x)$, and draws connections between SC and online convex optimization as well as P-control. The predictive controller (PC) extends this by forecasting progression-to-go $\widehat{C}_t^b$ across $B$ futures, enabling planning under non-stationary demand and improving long-term objectives on non-stationary data; empirical results across KuaiRec, Tv Audience, and synthetic data show that SC and PC generally outperform MC, with PC offering the best gains when temporal patterns are present. Overall, the work provides a principled toolkit for steering ranking platforms toward long-term welfare while preserving short-term engagement, highlighting trade-offs in planning, robustness, and computational cost.

Abstract

The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers, and other stakeholders. In this paper, we thus develop a new framework in which decision makers (e.g., platform operators, regulators, users) can express long-term goals for the behavior of the platform (e.g., fairness, revenue distribution, legal requirements). These goals take the form of exposure or impact targets that go well beyond individual sessions, and we provide new control-based algorithms to achieve these goals. In particular, the controllers are designed to achieve the stated long-term goals with minimum impact on short-term engagement. Beyond the principled theoretical derivation of the controllers, we evaluate the algorithms on both synthetic and real-world data. While all controllers perform well, we find that they provide interesting trade-offs in efficiency, robustness, and the ability to plan ahead.

Ranking with Long-Term Constraints

TL;DR

The paper tackles the problem of aligning long-term platform objectives with short-term ranking quality, formalizing a macro–micro control framework where long-horizon exposure/impact targets are enforced via per-step rankings. It introduces three controllers—Myopic (MC), Stationary (SC), and Predictive (PC)—grounded in a full-information linear-utility model with and , and draws connections between SC and online convex optimization as well as P-control. The predictive controller (PC) extends this by forecasting progression-to-go across futures, enabling planning under non-stationary demand and improving long-term objectives on non-stationary data; empirical results across KuaiRec, Tv Audience, and synthetic data show that SC and PC generally outperform MC, with PC offering the best gains when temporal patterns are present. Overall, the work provides a principled toolkit for steering ranking platforms toward long-term welfare while preserving short-term engagement, highlighting trade-offs in planning, robustness, and computational cost.

Abstract

The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choice data may only improve short-term engagement, but not the long-term sustainability of the platform and the long-term benefits to its users, content providers, and other stakeholders. In this paper, we thus develop a new framework in which decision makers (e.g., platform operators, regulators, users) can express long-term goals for the behavior of the platform (e.g., fairness, revenue distribution, legal requirements). These goals take the form of exposure or impact targets that go well beyond individual sessions, and we provide new control-based algorithms to achieve these goals. In particular, the controllers are designed to achieve the stated long-term goals with minimum impact on short-term engagement. Beyond the principled theoretical derivation of the controllers, we evaluate the algorithms on both synthetic and real-world data. While all controllers perform well, we find that they provide interesting trade-offs in efficiency, robustness, and the ability to plan ahead.
Paper Structure (27 sections, 26 equations, 5 figures, 1 table, 8 algorithms)

This paper contains 27 sections, 26 equations, 5 figures, 1 table, 8 algorithms.

Figures (5)

  • Figure 1: We propose to separate macro-level control used for steering the long-term dynamics of the platform from its micro-level engagement optimization. The interface layer provides an abstraction by optimally translating strategic macro-level interventions into a sequence of micro-level actions with minimal impact on short-term metrics.
  • Figure 2: Experiment results comparing all controllers across two datasets KuaiRec and Tv Audience. The x-axis is $\phi$ on a log-scale in all plots. The first column is the final objective \ref{['eq:mainobjective']} value, the middle column is the the utility metric (DCG), and the final column is the macro-violation. The oracle has access to the test time contexts and directly optimizes the original objective \ref{['eq:mainobjective']}. The MC w/o constraints is an unconstrained utility maximizing controller.
  • Figure 3: Comparison of all controllers on a synthetic dataset to showcase when the PC should be preferred. The o's and x's represent two groups of items. The top row is the average utility over time and the dashed grey line represents the highest achievable utility under the exposure constraint. The bottom row displays the exposure over time of both item groups. The grey x's and o's represent the target exposure for the groups.
  • Figure 4: Comparison of two different versions of the last.fm dataset. The left plot enforces a temporal pattern during training, and the right plot shuffles the dataset and breaks the temporal pattern. Furthermore, the test time contexts have a temporal pattern, and the target exposure is kept the same across both plots.
  • Figure 5: Comparison of different forecast samples used for computing the progress-to-go. The KuaiRec dataset on the left is non-temporal and the Tv Audience dataset on the right is temporal.