FedUD: Exploiting Unaligned Data for Cross-Platform Federated Click-Through Rate Prediction
Wentao Ouyang, Rui Dong, Ri Tao, Xiangzheng Liu
TL;DR
FedUD addresses cross-platform CTR prediction under privacy constraints by introducing a two-step vertical federated learning framework. Step 1 trains on aligned data with a knowledge-distillation based representation transfer network to distill guest representations into the host space. Step 2 uses the learned Rep to infer guest representations for unaligned host data, enabling joint training on both data types while keeping Rep parameters frozen. Experiments on Avazu and Industrial datasets show FedUD achieving state-of-the-art AUC on overall, aligned, and unaligned data, demonstrating its ability to leverage unaligned data without compromising privacy and its practical impact for more accurate CTR models across platforms.
Abstract
Click-through rate (CTR) prediction plays an important role in online advertising platforms. Most existing methods use data from the advertising platform itself for CTR prediction. As user behaviors also exist on many other platforms, e.g., media platforms, it is beneficial to further exploit such complementary information for better modeling user interest and for improving CTR prediction performance. However, due to privacy concerns, data from different platforms cannot be uploaded to a server for centralized model training. Vertical federated learning (VFL) provides a possible solution which is able to keep the raw data on respective participating parties and learn a collaborative model in a privacy-preserving way. However, traditional VFL methods only utilize aligned data with common keys across parties, which strongly restricts their application scope. In this paper, we propose FedUD, which is able to exploit unaligned data, in addition to aligned data, for more accurate federated CTR prediction. FedUD contains two steps. In the first step, FedUD utilizes aligned data across parties like traditional VFL, but it additionally includes a knowledge distillation module. This module distills useful knowledge from the guest party's high-level representations and guides the learning of a representation transfer network. In the second step, FedUD applies the learned knowledge to enrich the representations of the host party's unaligned data such that both aligned and unaligned data can contribute to federated model training. Experiments on two real-world datasets demonstrate the superior performance of FedUD for federated CTR prediction.
