Table of Contents
Fetching ...

LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy

Peng Cui, Yiming Yang, Fusheng Jin, Siyuan Tang, Yunli Wang, Fukang Yang, Yalong Jia, Qingpeng Cai, Fei Pan, Changcheng Li, Peng Jiang

TL;DR

This paper proposes the Long-Delayed Ad Conversions Prediction model for bidding strategy (LDACP), which consists of two sub-modules, which consists of a Bucket Classification Module with label Smoothing method, and a Mixture of Experts structure that integrates the predictions from BCMS and VRMP to obtain the final predicted ad conversion number.

Abstract

In online advertising, once an ad campaign is deployed, the automated bidding system dynamically adjusts the bidding strategy to optimize Cost Per Action (CPA) based on the number of ad conversions. For ads with a long conversion delay, relying solely on the real-time tracked conversion number as a signal for bidding strategy can significantly overestimate the current CPA, leading to conservative bidding strategies. Therefore, it is crucial to predict the number of long-delayed conversions. Nonetheless, it is challenging to predict ad conversion numbers through traditional regression methods due to the wide range of ad conversion numbers. Previous regression works have addressed this challenge by transforming regression problems into bucket classification problems, achieving success in various scenarios. However, specific challenges arise when predicting the number of ad conversions: 1) The integer nature of ad conversion numbers exacerbates the discontinuity issue in one-hot hard labels; 2) The long-tail distribution of ad conversion numbers complicates tail data prediction. In this paper, we propose the Long-Delayed Ad Conversions Prediction model for bidding strategy (LDACP), which consists of two sub-modules. To alleviate the issue of discontinuity in one-hot hard labels, the Bucket Classification Module with label Smoothing method (BCMS) converts one-hot hard labels into non-normalized soft labels, then fits these soft labels by minimizing classification loss and regression loss. To address the challenge of predicting tail data, the Value Regression Module with Proxy labels (VRMP) uses the prediction bias of aggregated pCTCVR as proxy labels. Finally, a Mixture of Experts (MoE) structure integrates the predictions from BCMS and VRMP to obtain the final predicted ad conversion number.

LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy

TL;DR

This paper proposes the Long-Delayed Ad Conversions Prediction model for bidding strategy (LDACP), which consists of two sub-modules, which consists of a Bucket Classification Module with label Smoothing method, and a Mixture of Experts structure that integrates the predictions from BCMS and VRMP to obtain the final predicted ad conversion number.

Abstract

In online advertising, once an ad campaign is deployed, the automated bidding system dynamically adjusts the bidding strategy to optimize Cost Per Action (CPA) based on the number of ad conversions. For ads with a long conversion delay, relying solely on the real-time tracked conversion number as a signal for bidding strategy can significantly overestimate the current CPA, leading to conservative bidding strategies. Therefore, it is crucial to predict the number of long-delayed conversions. Nonetheless, it is challenging to predict ad conversion numbers through traditional regression methods due to the wide range of ad conversion numbers. Previous regression works have addressed this challenge by transforming regression problems into bucket classification problems, achieving success in various scenarios. However, specific challenges arise when predicting the number of ad conversions: 1) The integer nature of ad conversion numbers exacerbates the discontinuity issue in one-hot hard labels; 2) The long-tail distribution of ad conversion numbers complicates tail data prediction. In this paper, we propose the Long-Delayed Ad Conversions Prediction model for bidding strategy (LDACP), which consists of two sub-modules. To alleviate the issue of discontinuity in one-hot hard labels, the Bucket Classification Module with label Smoothing method (BCMS) converts one-hot hard labels into non-normalized soft labels, then fits these soft labels by minimizing classification loss and regression loss. To address the challenge of predicting tail data, the Value Regression Module with Proxy labels (VRMP) uses the prediction bias of aggregated pCTCVR as proxy labels. Finally, a Mixture of Experts (MoE) structure integrates the predictions from BCMS and VRMP to obtain the final predicted ad conversion number.

Paper Structure

This paper contains 24 sections, 22 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: For ad campaigns using oCPM bidding, the automated bidding system dynamically adjusts the CPA_bid coefficient based on the number of real-time tracked ad conversions. Our proposed LDACP predicts the number of conversions for ads with a long conversion delay. The predicted number of conversions will determine the bidding strategies of the automated bidding system.
  • Figure 2: The distribution characteristics of ad conversion numbers and the PCOC of the ranking model.
  • Figure 3: The LDACP consists of two sub-modules. The BCMS predicts the number of ad conversions utilizing a bucket classification method. It transforms one-hot hard labels into non-normalized soft labels, which are then fitted by minimizing Cross Entropy Loss and MSE Loss. The VRMP learns the PCOC of the ranking model using a traditional regression method, addressing the challenge of predicting tail data by leveraging the characteristic of PCOC, which does not exhibit a long-tail distribution. The predictions from BCMS and VRMP are integrated using a Mixture of Experts (MoE) structure to obtain the predicted ad conversions number $\hat{y}$.
  • Figure 4: An instance of a TPMS. Each edge in the tree is associated with a predictor, and when $y$ changes, $h(\psi(\cdot))$ smoothly adjusts to address the discontinuity in one-hot hard labels.
  • Figure 5: The CR and average value of $\lambda$ on Kuai-AD test set. $\hat{y}_{f}$ and $\hat{y}_{g}$ each have their distinct advantages. $\hat{y}$ integrates the strengths of both to address the challenge of predicting tail data and enhance overall performance.
  • ...and 3 more figures