Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model

Wentao Ouyang; Xiuwu Zhang; Chaofeng Guo; Shukui Ren; Yupei Sui; Kun Zhang; Jinmei Luo; Yunfeng Chen; Dongbo Xu; Xiangzheng Liu; Yanlong Du

Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model

Wentao Ouyang, Xiuwu Zhang, Chaofeng Guo, Shukui Ren, Yupei Sui, Kun Zhang, Jinmei Luo, Yunfeng Chen, Dongbo Xu, Xiangzheng Liu, Yanlong Du

TL;DR

The Masked Multi-domain Network (MMN) is proposed, which model domain-specific parameters and propose a dynamically weighted loss to account for the loss scale imbalance issue within each mini-batch and an auto-masking strategy which can take mixed data from all the domains as input.

Abstract

In real-world advertising systems, conversions have different types in nature and ads can be shown in different display scenarios, both of which highly impact the actual conversion rate (CVR). This results in the multi-type and multi-scenario CVR prediction problem. A desired model for this problem should satisfy the following requirements: 1) Accuracy: the model should achieve fine-grained accuracy with respect to any conversion type in any display scenario. 2) Scalability: the model parameter size should be affordable. 3) Convenience: the model should not require a large amount of effort in data partitioning, subset processing and separate storage. Existing approaches cannot simultaneously satisfy these requirements. For example, building a separate model for each (conversion type, display scenario) pair is neither scalable nor convenient. Building a unified model trained on all the data with conversion type and display scenario included as two features is not accurate enough. In this paper, we propose the Masked Multi-domain Network (MMN) to solve this problem. To achieve the accuracy requirement, we model domain-specific parameters and propose a dynamically weighted loss to account for the loss scale imbalance issue within each mini-batch. To achieve the scalability requirement, we propose a parameter sharing and composition strategy to reduce model parameters from a product space to a sum space. To achieve the convenience requirement, we propose an auto-masking strategy which can take mixed data from all the domains as input. It avoids the overhead caused by data partitioning, individual processing and separate storage. Both offline and online experimental results validate the superiority of MMN for multi-type and multi-scenario CVR prediction. MMN is now the serving model for real-time CVR prediction in UC Toutiao.

Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model

TL;DR

Abstract

Paper Structure (20 sections, 11 equations, 6 figures, 3 tables)

This paper contains 20 sections, 11 equations, 6 figures, 3 tables.

Introduction
Masked Multi-domain Network
Multi-Type and Multi-Scenario CVR Prediction
Modeling Adaptive Parameters
Auto-Masking
Training with Dynamically Weighted Loss
Fast Online Prediction
Experiments
Datasets
Methods in Comparison
Evaluation Metrics
Offline Performance
Accuracy.
Scalability.
Convenience.
...and 5 more sections

Figures (6)

Figure 1: $A$ and $B$ are two input instances. (a) In multi-task learning, domains are the same and tasks are different. One input impacts the model parameters of all tasks. (b) In multi-domain learning, domains are different and tasks are the same. One input impacts the parameters for that domain.
Figure 2: Masked multi-domain network. For simplicity, all the CTR and CVR prediction towers use the multi-layer perceptron consisting of several fully connected layers.
Figure 3: (a) Per-domain CVR tower parameter; per-domain dataset. (b) Parameter sharing and composition; one dataset with auto-masking.
Figure 4: Illustration of how auto-masking can allow a mini-batch of instances from different (conversion type, display scenario) domains to be computed simultaneously with the same operators but different domain-specific parameters. Back propagation is fast because no gradient needs to be computed for masked entries (i.e., setting to 0).
Figure 5: Effect of domain-specific parameters.
...and 1 more figures

Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model

TL;DR

Abstract

Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model

Authors

TL;DR

Abstract

Table of Contents

Figures (6)