Efficient Transfer Learning Framework for Cross-Domain Click-Through Rate Prediction
Qi Liu, Xingyuan Tang, Jianqiang Huang, Xiangqian Yu, Haoran Jin, Jin Chen, Yuanhao Pu, Defu Lian, Tan Qu, Zhe Wang, Jia Cheng, Jun Lei
TL;DR
The paper tackles data sparsity and catastrophic forgetting in cross-domain CTR by introducing E-CDCTR, a tri-level asynchronous transfer framework that separately pre-trains on long-term natural data (TPM), short-term natural data (CPM), and adapts to advertisement data (A-CTR). TPM provides long-term user/item signals via historical embeddings; CPM initializes the ad CTR model with rich features and transfers parameters; A-CTR finetunes on ad data using TPM embeddings and CPM initialization. Offline experiments on a Meituan industrial dataset show consistent GAUC improvements, and online A/B testing demonstrates tangible gains in CTR and RPM, confirming practical impact. The approach emphasizes efficient pre-training with minimal data movement, robust handling of distribution shifts via BN reinitialization, and effective embedding aggregation to reduce online latency. Overall, E-CDCTR offers a scalable, practical solution for cross-domain CTR transfer in production systems with uneven data distributions.
Abstract
Natural content and advertisement coexist in industrial recommendation systems but differ in data distribution. Concretely, traffic related to the advertisement is considerably sparser compared to that of natural content, which motivates the development of transferring knowledge from the richer source natural content domain to the sparser advertising domain. The challenges include the inefficiencies arising from the management of extensive source data and the problem of 'catastrophic forgetting' that results from the CTR model's daily updating. To this end, we propose a novel tri-level asynchronous framework, i.e., Efficient Transfer Learning Framework for Cross-Domain Click-Through Rate Prediction (E-CDCTR), to transfer comprehensive knowledge of natural content to advertisement CTR models. This framework consists of three key components: Tiny Pre-training Model ((TPM), which trains a tiny CTR model with several basic features on long-term natural data; Complete Pre-training Model (CPM), which trains a CTR model holding network structure and input features the same as target advertisement on short-term natural data; Advertisement CTR model (A-CTR), which derives its parameter initialization from CPM together with multiple historical embeddings from TPM as extra feature and then fine-tunes on advertisement data. TPM provides richer representations of user and item for both the CPM and A-CTR, effectively alleviating the forgetting problem inherent in the daily updates. CPM further enhances the advertisement model by providing knowledgeable initialization, thereby alleviating the data sparsity challenges typically encountered by advertising CTR models. Such a tri-level cross-domain transfer learning framework offers an efficient solution to address both data sparsity and `catastrophic forgetting', yielding remarkable improvements.
