Table of Contents
Fetching ...

Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction

Yi Wu, Daryl Chang, Jennifer She, Zhe Zhao, Li Wei, Lukasz Heldt

TL;DR

The paper tackles the problem of optimizing long-term user satisfaction in slate-based recommendations by reframing ranking as slate optimization under a multi-objective MDP. It introduces the Learned Ranking Function (LRF), which uses a cascade click model and lift-based rewards to account for abandonment, and it employs a constrained optimization approach via dynamic linear scalarization to stabilize trade-offs across objectives. A practical on-policy Monte Carlo optimization framework trains separate user-item networks for abandonment, click, and lift signals, enabling inference that maximizes a learned $Q$-function. The approach is deployed on YouTube and validated through multiple live experiments, demonstrating improvements in long-term satisfaction and showcasing the value of lift formulations, cascade modeling, and offline-evaluation-guided weight adaptation for multi-objective slate optimization.

Abstract

We present the Learned Ranking Function (LRF), a system that takes short-term user-item behavior predictions as input and outputs a slate of recommendations that directly optimizes for long-term user satisfaction. Most previous work is based on optimizing the hyperparameters of a heuristic function. We propose to model the problem directly as a slate optimization problem with the objective of maximizing long-term user satisfaction. We also develop a novel constraint optimization algorithm that stabilizes objective trade-offs for multi-objective optimization. We evaluate our approach with live experiments and describe its deployment on YouTube.

Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction

TL;DR

The paper tackles the problem of optimizing long-term user satisfaction in slate-based recommendations by reframing ranking as slate optimization under a multi-objective MDP. It introduces the Learned Ranking Function (LRF), which uses a cascade click model and lift-based rewards to account for abandonment, and it employs a constrained optimization approach via dynamic linear scalarization to stabilize trade-offs across objectives. A practical on-policy Monte Carlo optimization framework trains separate user-item networks for abandonment, click, and lift signals, enabling inference that maximizes a learned -function. The approach is deployed on YouTube and validated through multiple live experiments, demonstrating improvements in long-term satisfaction and showcasing the value of lift formulations, cascade modeling, and offline-evaluation-guided weight adaptation for multi-objective slate optimization.

Abstract

We present the Learned Ranking Function (LRF), a system that takes short-term user-item behavior predictions as input and outputs a slate of recommendations that directly optimizes for long-term user satisfaction. Most previous work is based on optimizing the hyperparameters of a heuristic function. We propose to model the problem directly as a slate optimization problem with the objective of maximizing long-term user satisfaction. We also develop a novel constraint optimization algorithm that stabilizes objective trade-offs for multi-objective optimization. We evaluate our approach with live experiments and describe its deployment on YouTube.
Paper Structure (28 sections, 1 theorem, 16 equations, 3 figures, 2 algorithms)

This paper contains 28 sections, 1 theorem, 16 equations, 3 figures, 2 algorithms.

Key Result

Theorem 2.1

Given user-item functions $p_{clk},p_{abd},R_{abd}^{\pi},R_{lift}^{\pi}$ as input, the optimal ranking for user $u$ on candidate $V$ maximizing $Q^\pi((u,V),\sigma)$ for a scalar reward function is to order all items $v\in V$ by $\frac{p_{clk}(u,v)}{p_{clk}(u,v)+p_{abd}(u,v)} \cdot R_{lift}^{\pi}(u,

Figures (3)

  • Figure 1: Markov Reward Process with Cascade Click Model
  • Figure 2: LRF deployment diagram
  • Figure 3: Metrics for experiments in Section \ref{['sec:launch']}(top left), \ref{['sec:cascade']} (top right), \ref{['sec:uplift']} (bottom left), and \ref{['sec:two']} (bottom right).

Theorems & Definitions (5)

  • definition 1
  • definition 2
  • Theorem 2.1
  • proof
  • definition 3