Table of Contents
Fetching ...

Entire-Space Variational Information Exploitation for Post-Click Conversion Rate Prediction

Ke Fei, Xinyue Zhang, Jingjing Li

TL;DR

An entire-space variational information exploitation framework (EVI) for CVR prediction that uses a conditional entire-space CVR teacher to generate unbiased pseudo labels and applies variational information exploitation and logit distillation to transfer non-click space information to the target CVR estimator.

Abstract

In recommender systems, post-click conversion rate (CVR) estimation is an essential task to model user preferences for items and estimate the value of recommendations. Sample selection bias (SSB) and data sparsity (DS) are two persistent challenges for post-click conversion rate (CVR) estimation. Currently, entire-space approaches that exploit unclicked samples through knowledge distillation are promising to mitigate SSB and DS simultaneously. Existing methods use non-conversion, conversion, or adaptive conversion predictors to generate pseudo labels for unclicked samples. However, they fail to consider the unbiasedness and information limitations of these pseudo labels. Motivated by such analysis, we propose an entire-space variational information exploitation framework (EVI) for CVR prediction. First, EVI uses a conditional entire-space CVR teacher to generate unbiased pseudo labels. Then, it applies variational information exploitation and logit distillation to transfer non-click space information to the target CVR estimator. We conduct extensive offline experiments on six large-scale datasets. EVI demonstrated a 2.25\% average improvement compared to the state-of-the-art baselines.

Entire-Space Variational Information Exploitation for Post-Click Conversion Rate Prediction

TL;DR

An entire-space variational information exploitation framework (EVI) for CVR prediction that uses a conditional entire-space CVR teacher to generate unbiased pseudo labels and applies variational information exploitation and logit distillation to transfer non-click space information to the target CVR estimator.

Abstract

In recommender systems, post-click conversion rate (CVR) estimation is an essential task to model user preferences for items and estimate the value of recommendations. Sample selection bias (SSB) and data sparsity (DS) are two persistent challenges for post-click conversion rate (CVR) estimation. Currently, entire-space approaches that exploit unclicked samples through knowledge distillation are promising to mitigate SSB and DS simultaneously. Existing methods use non-conversion, conversion, or adaptive conversion predictors to generate pseudo labels for unclicked samples. However, they fail to consider the unbiasedness and information limitations of these pseudo labels. Motivated by such analysis, we propose an entire-space variational information exploitation framework (EVI) for CVR prediction. First, EVI uses a conditional entire-space CVR teacher to generate unbiased pseudo labels. Then, it applies variational information exploitation and logit distillation to transfer non-click space information to the target CVR estimator. We conduct extensive offline experiments on six large-scale datasets. EVI demonstrated a 2.25\% average improvement compared to the state-of-the-art baselines.

Paper Structure

This paper contains 27 sections, 6 theorems, 29 equations, 3 figures, 3 tables.

Key Result

Theorem 1

The entire-space CVR estimator is biased when the pseudo conversion labels of unclicked samples $r^*_{u,i}$ are biased, i.e.,

Figures (3)

  • Figure 1: Architecture of EVI. The EVI consists of CTR, CVR-T (teacher) and CVR estimator (student) with shared embedding layer. Following the Experts & Gates, three three multilayer perceptrons serve as representation learners and predictors. CVR-T (teacher) produces unbiased pseudo conversion labels based on click-conditioned representations. We maximize the variational information between CVR-T and CVR representation learner to tranfer entire-space conversion knowledge to CVR estimator.
  • Figure 2: The teachers' logloss on non-click space and the students' CVR mean bias. EVI w/o VIE means the EVI excludes variational information exploitation.
  • Figure 3: Effects of varying VIE loss ratio and the number of transfer layers on four public and one industrial datasets.

Theorems & Definitions (12)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • ...and 2 more