Table of Contents
Fetching ...

CREAD: A Classification-Restoration Framework with Error Adaptive Discretization for Watch Time Prediction in Video Recommender Systems

Jie Sun, Zhaoying Ding, Xiaoshuang Chen, Qi Chen, Yincheng Wang, Kaiqiao Zhan, Ben Wang

TL;DR

This work tackles watch time prediction under highly imbalanced distributions by recasting regression as a Classification-Restoration problem. The proposed CREAD framework introduces an error-adaptive discretization (EAD) that balances learning error and restoration error, combined with a set of ordinal classifiers and a restoration mechanism to recover continuous watch time. The authors provide theoretical error-bounds for discretization, present an actionable EAD strategy using a calibrating function, and validate the approach with extensive offline experiments and a real online deployment showing a $0.29\%$ uplift in watch time. Overall, CREAD demonstrates that careful discretization design and ordinal modeling provide meaningful gains in production-scale video recommender systems.

Abstract

The watch time is a significant indicator of user satisfaction in video recommender systems. However, the prediction of watch time as a target variable is often hindered by its highly imbalanced distribution with a scarcity of observations for larger target values and over-populated samples for small values. State-of-the-art watch time prediction models discretize the continuous watch time into a set of buckets in order to consider the distribution of watch time. However, it is highly uninvestigated how these discrete buckets should be created from the continuous watch time distribution, and existing discretization approaches suffer from either a large learning error or a large restoration error. To address this challenge, we propose a Classification-Restoration framework with Error-Adaptive-Discretization (CREAD) to accurately predict the watch time. The proposed framework contains a discretization module, a classification module, and a restoration module. It predicts the watch time through multiple classification problems. The discretization process is a key contribution of the CREAD framework. We theoretically analyze the impacts of the discretization on the learning error and the restoration error, and then propose the error-adaptive discretization (EAD) technique to better balance the two errors, which achieves better performance over traditional discretization approaches. We conduct detailed offline evaluations on a public dataset and an industrial dataset, both showing performance gains through the proposed approach. Moreover, We have fully launched our framework to Kwai App, an online video platform, which resulted in a significant increase in users' video watch time by 0.29% through A/B testing. These results highlight the effectiveness of the CREAD framework in watch time prediction in video recommender systems.

CREAD: A Classification-Restoration Framework with Error Adaptive Discretization for Watch Time Prediction in Video Recommender Systems

TL;DR

This work tackles watch time prediction under highly imbalanced distributions by recasting regression as a Classification-Restoration problem. The proposed CREAD framework introduces an error-adaptive discretization (EAD) that balances learning error and restoration error, combined with a set of ordinal classifiers and a restoration mechanism to recover continuous watch time. The authors provide theoretical error-bounds for discretization, present an actionable EAD strategy using a calibrating function, and validate the approach with extensive offline experiments and a real online deployment showing a uplift in watch time. Overall, CREAD demonstrates that careful discretization design and ordinal modeling provide meaningful gains in production-scale video recommender systems.

Abstract

The watch time is a significant indicator of user satisfaction in video recommender systems. However, the prediction of watch time as a target variable is often hindered by its highly imbalanced distribution with a scarcity of observations for larger target values and over-populated samples for small values. State-of-the-art watch time prediction models discretize the continuous watch time into a set of buckets in order to consider the distribution of watch time. However, it is highly uninvestigated how these discrete buckets should be created from the continuous watch time distribution, and existing discretization approaches suffer from either a large learning error or a large restoration error. To address this challenge, we propose a Classification-Restoration framework with Error-Adaptive-Discretization (CREAD) to accurately predict the watch time. The proposed framework contains a discretization module, a classification module, and a restoration module. It predicts the watch time through multiple classification problems. The discretization process is a key contribution of the CREAD framework. We theoretically analyze the impacts of the discretization on the learning error and the restoration error, and then propose the error-adaptive discretization (EAD) technique to better balance the two errors, which achieves better performance over traditional discretization approaches. We conduct detailed offline evaluations on a public dataset and an industrial dataset, both showing performance gains through the proposed approach. Moreover, We have fully launched our framework to Kwai App, an online video platform, which resulted in a significant increase in users' video watch time by 0.29% through A/B testing. These results highlight the effectiveness of the CREAD framework in watch time prediction in video recommender systems.
Paper Structure (39 sections, 4 theorems, 37 equations, 9 figures, 4 tables)

This paper contains 39 sections, 4 theorems, 37 equations, 9 figures, 4 tables.

Key Result

Lemma 4.1

Assume that $\hat{p}_m({\bm{x}})$ and $\hat{w}_m$ are unbiased estimations of $p_m({\bm{x}})$ and $w_m$, respectively, we have where

Figures (9)

  • Figure 1: Probability density plot of watch time.
  • Figure 2: The CREAD Framework.
  • Figure 3: Two kinds of errors in the discretization.
  • Figure 4: EAD compared with traditional methods.
  • Figure 5: Error terms in different hyperparameters.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Lemma 4.1
  • Theorem 4.2
  • proof
  • proof
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • proof : Proof of the Upper Bound of $V_p$ in Theorem \ref{['thm:error-bound']}
  • proof : Proof of the Upper Bound of $V_w$ in Theorem \ref{['thm:error-bound']}
  • ...and 1 more