Table of Contents
Fetching ...

Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation

Emily Liu, Kuan Han, Minfeng Zhan, Bocheng Zhao, Guanyu Mu, Yang Song

TL;DR

A novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups is proposed and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning.

Abstract

Watch time is widely used as a proxy for user satisfaction in video recommendation platforms. However, raw watch times are influenced by confounding factors such as video duration, popularity, and individual user behaviors, potentially distorting preference signals and resulting in biased recommendation models. We propose a novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups. This approach yields a quantile-based preference signal and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning. Additionally, we present distributional embeddings to efficiently parameterize watch-time quantiles without requiring online sampling or storage of historical data. Both offline and online experiments demonstrate significant improvements in recommendation accuracy and robustness compared to existing baseline methods.

Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation

TL;DR

A novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups is proposed and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning.

Abstract

Watch time is widely used as a proxy for user satisfaction in video recommendation platforms. However, raw watch times are influenced by confounding factors such as video duration, popularity, and individual user behaviors, potentially distorting preference signals and resulting in biased recommendation models. We propose a novel relative advantage debiasing framework that corrects watch time by comparing it to empirically derived reference distributions conditioned on user and item groups. This approach yields a quantile-based preference signal and introduces a two-stage architecture that explicitly separates distribution estimation from preference learning. Additionally, we present distributional embeddings to efficiently parameterize watch-time quantiles without requiring online sampling or storage of historical data. Both offline and online experiments demonstrate significant improvements in recommendation accuracy and robustness compared to existing baseline methods.

Paper Structure

This paper contains 44 sections, 2 theorems, 12 equations, 2 figures, 9 tables, 2 algorithms.

Key Result

Proposition 1

For any video-side confounder $c^{(k)}$,

Figures (2)

  • Figure 1: An overview of the RAD framework. First, distributional embeddings for user- and video-side confounders are learned from historical data. Then, these embeddings are used to measure the CDF of a biased watch-time value conditioned on user and video confounding variables. User-side and video-side signals are merged through Bayesian fusion to create a single bias-free label for preference learning.
  • Figure 2: Gaussian kernel density estimates of predicted versus ground‐truth watch‐time distributions for user‐side clusters grouped by training‐set size quartile: (A) bottom 25 %, (B) 25–50 %, (C) 50–75 %, and (D) top 25 %. CQE’s single‐stage estimates (blue) fail to capture the lower‐value regions in (A) and (D), miss secondary modes in (B) and (C), and underestimate heavy tails in (A), (B), and (D), leading to larger errors. In contrast, the MQ + MLP multiquantile model (red) more closely follows the ground-truth curves (green) across all cohort sizes, effectively modeling both typical and irregular shapes.

Theorems & Definitions (2)

  • Proposition 1: Variance monotonicity under umbrella sufficiency
  • Proposition 2: Statistics and Independence of Quantile Labels