Table of Contents
Fetching ...

FedUD: Exploiting Unaligned Data for Cross-Platform Federated Click-Through Rate Prediction

Wentao Ouyang, Rui Dong, Ri Tao, Xiangzheng Liu

TL;DR

FedUD addresses cross-platform CTR prediction under privacy constraints by introducing a two-step vertical federated learning framework. Step 1 trains on aligned data with a knowledge-distillation based representation transfer network to distill guest representations into the host space. Step 2 uses the learned Rep to infer guest representations for unaligned host data, enabling joint training on both data types while keeping Rep parameters frozen. Experiments on Avazu and Industrial datasets show FedUD achieving state-of-the-art AUC on overall, aligned, and unaligned data, demonstrating its ability to leverage unaligned data without compromising privacy and its practical impact for more accurate CTR models across platforms.

Abstract

Click-through rate (CTR) prediction plays an important role in online advertising platforms. Most existing methods use data from the advertising platform itself for CTR prediction. As user behaviors also exist on many other platforms, e.g., media platforms, it is beneficial to further exploit such complementary information for better modeling user interest and for improving CTR prediction performance. However, due to privacy concerns, data from different platforms cannot be uploaded to a server for centralized model training. Vertical federated learning (VFL) provides a possible solution which is able to keep the raw data on respective participating parties and learn a collaborative model in a privacy-preserving way. However, traditional VFL methods only utilize aligned data with common keys across parties, which strongly restricts their application scope. In this paper, we propose FedUD, which is able to exploit unaligned data, in addition to aligned data, for more accurate federated CTR prediction. FedUD contains two steps. In the first step, FedUD utilizes aligned data across parties like traditional VFL, but it additionally includes a knowledge distillation module. This module distills useful knowledge from the guest party's high-level representations and guides the learning of a representation transfer network. In the second step, FedUD applies the learned knowledge to enrich the representations of the host party's unaligned data such that both aligned and unaligned data can contribute to federated model training. Experiments on two real-world datasets demonstrate the superior performance of FedUD for federated CTR prediction.

FedUD: Exploiting Unaligned Data for Cross-Platform Federated Click-Through Rate Prediction

TL;DR

FedUD addresses cross-platform CTR prediction under privacy constraints by introducing a two-step vertical federated learning framework. Step 1 trains on aligned data with a knowledge-distillation based representation transfer network to distill guest representations into the host space. Step 2 uses the learned Rep to infer guest representations for unaligned host data, enabling joint training on both data types while keeping Rep parameters frozen. Experiments on Avazu and Industrial datasets show FedUD achieving state-of-the-art AUC on overall, aligned, and unaligned data, demonstrating its ability to leverage unaligned data without compromising privacy and its practical impact for more accurate CTR models across platforms.

Abstract

Click-through rate (CTR) prediction plays an important role in online advertising platforms. Most existing methods use data from the advertising platform itself for CTR prediction. As user behaviors also exist on many other platforms, e.g., media platforms, it is beneficial to further exploit such complementary information for better modeling user interest and for improving CTR prediction performance. However, due to privacy concerns, data from different platforms cannot be uploaded to a server for centralized model training. Vertical federated learning (VFL) provides a possible solution which is able to keep the raw data on respective participating parties and learn a collaborative model in a privacy-preserving way. However, traditional VFL methods only utilize aligned data with common keys across parties, which strongly restricts their application scope. In this paper, we propose FedUD, which is able to exploit unaligned data, in addition to aligned data, for more accurate federated CTR prediction. FedUD contains two steps. In the first step, FedUD utilizes aligned data across parties like traditional VFL, but it additionally includes a knowledge distillation module. This module distills useful knowledge from the guest party's high-level representations and guides the learning of a representation transfer network. In the second step, FedUD applies the learned knowledge to enrich the representations of the host party's unaligned data such that both aligned and unaligned data can contribute to federated model training. Experiments on two real-world datasets demonstrate the superior performance of FedUD for federated CTR prediction.
Paper Structure (16 sections, 12 equations, 3 figures, 2 tables)

This paper contains 16 sections, 12 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Illustration of the two steps in FedUD. (a) Step 1: Federated learning using aligned data across parties $\{x_H^a, y_H^a, x_G^a\}$ with knowledge distillation. (b) Step 2: Federated learning using both aligned data across parties and unaligned data $\{x_H^u, y_H^u\}$.
  • Figure 2: Overall test AUC vs. the number of the guest party's feature slots. (a) Avazu dataset. (b) Industrial dataset.
  • Figure 3: Overall test AUC vs. the number of unaligned training samples. (a) Avazu dataset. (b) Industrial dataset.