Table of Contents
Fetching ...

Federated Cross-Domain Click-Through Rate Prediction With Large Language Model Augmentation

Jiangcheng Qin, Xueyuan Zhang, Baisong Liu, Jiangbo Qian, Yangyang Wang

TL;DR

FedCCTR-LM tackles privacy-preserving cross-domain CTR prediction in heterogeneous, data-sparse environments by combining three core innovations. PrivAugNet uses LLMs to augment items, users, and sequences, mitigating sparsity and aligning cross-domain feature spaces without centralizing raw data. IDST-CL employs independent domain-specific transformers with Intra-Domain Representation Alignment (IDRA) and Cross-Domain Representation Disentanglement (CDRD) to balance domain personalization with cross-domain knowledge transfer, aided by contrastive learning. AdaLDP dynamically tunes gradient-level noise to maintain a favorable privacy-utility trade-off during federated training. Empirical results on four real-world datasets show that FedCCTR-LM surpasses traditional, cross-domain, and federated baselines in CTR accuracy while preserving privacy, with ablations validating the contribution of each module and sensitivity analyses guiding practical deployment.

Abstract

Accurately predicting click-through rates (CTR) under stringent privacy constraints poses profound challenges, particularly when user-item interactions are sparse and fragmented across domains. Conventional cross-domain CTR (CCTR) methods frequently assume homogeneous feature spaces and rely on centralized data sharing, neglecting complex inter-domain discrepancies and the subtle trade-offs imposed by privacy-preserving protocols. Here, we present Federated Cross-Domain CTR Prediction with Large Language Model Augmentation (FedCCTR-LM), a federated framework engineered to address these limitations by synchronizing data augmentation, representation disentanglement, and adaptive privacy protection. Our approach integrates three core innovations. First, the Privacy-Preserving Augmentation Network (PrivAugNet) employs large language models to enrich user and item representations and expand interaction sequences, mitigating data sparsity and feature incompleteness. Second, the Independent Domain-Specific Transformer with Contrastive Learning (IDST-CL) module disentangles domain-specific and shared user preferences, employing intra-domain representation alignment (IDRA) and crossdomain representation disentanglement (CDRD) to refine the learned embeddings and enhance knowledge transfer across domains. Finally, the Adaptive Local Differential Privacy (AdaLDP) mechanism dynamically calibrates noise injection to achieve an optimal balance between rigorous privacy guarantees and predictive accuracy. Empirical evaluations on four real-world datasets demonstrate that FedCCTR-LM substantially outperforms existing baselines, offering robust, privacy-preserving, and generalizable cross-domain CTR prediction in heterogeneous, federated environments.

Federated Cross-Domain Click-Through Rate Prediction With Large Language Model Augmentation

TL;DR

FedCCTR-LM tackles privacy-preserving cross-domain CTR prediction in heterogeneous, data-sparse environments by combining three core innovations. PrivAugNet uses LLMs to augment items, users, and sequences, mitigating sparsity and aligning cross-domain feature spaces without centralizing raw data. IDST-CL employs independent domain-specific transformers with Intra-Domain Representation Alignment (IDRA) and Cross-Domain Representation Disentanglement (CDRD) to balance domain personalization with cross-domain knowledge transfer, aided by contrastive learning. AdaLDP dynamically tunes gradient-level noise to maintain a favorable privacy-utility trade-off during federated training. Empirical results on four real-world datasets show that FedCCTR-LM surpasses traditional, cross-domain, and federated baselines in CTR accuracy while preserving privacy, with ablations validating the contribution of each module and sensitivity analyses guiding practical deployment.

Abstract

Accurately predicting click-through rates (CTR) under stringent privacy constraints poses profound challenges, particularly when user-item interactions are sparse and fragmented across domains. Conventional cross-domain CTR (CCTR) methods frequently assume homogeneous feature spaces and rely on centralized data sharing, neglecting complex inter-domain discrepancies and the subtle trade-offs imposed by privacy-preserving protocols. Here, we present Federated Cross-Domain CTR Prediction with Large Language Model Augmentation (FedCCTR-LM), a federated framework engineered to address these limitations by synchronizing data augmentation, representation disentanglement, and adaptive privacy protection. Our approach integrates three core innovations. First, the Privacy-Preserving Augmentation Network (PrivAugNet) employs large language models to enrich user and item representations and expand interaction sequences, mitigating data sparsity and feature incompleteness. Second, the Independent Domain-Specific Transformer with Contrastive Learning (IDST-CL) module disentangles domain-specific and shared user preferences, employing intra-domain representation alignment (IDRA) and crossdomain representation disentanglement (CDRD) to refine the learned embeddings and enhance knowledge transfer across domains. Finally, the Adaptive Local Differential Privacy (AdaLDP) mechanism dynamically calibrates noise injection to achieve an optimal balance between rigorous privacy guarantees and predictive accuracy. Empirical evaluations on four real-world datasets demonstrate that FedCCTR-LM substantially outperforms existing baselines, offering robust, privacy-preserving, and generalizable cross-domain CTR prediction in heterogeneous, federated environments.

Paper Structure

This paper contains 53 sections, 4 theorems, 43 equations, 13 figures, 6 tables, 2 algorithms.

Key Result

Lemma 1

Let $f$ be a function with $\ell_2$-sensitivity $\Delta$, and define the Gaussian mechanism as: where $\mathcal{N}(0, \sigma^2 \mathbf{I})$ denotes the multivariate Gaussian distribution with covariance $\sigma^2 \mathbf{I}$. Then, for any $\alpha \geq 1$, the mechanism satisfies $(\alpha, \epsilon)$-RDP with:

Figures (13)

  • Figure 1: A Toy Illustration of Cross-Domain CTR Prediction Challenges: Non-I.I.D. Data Distributions, Knowledge Transfer Barriers, and Privacy-Utility Trade-offs.
  • Figure 2: The architecture of the proposed FedCCTR-LM framework for federated cross-domain click-through rate (CTR) prediction. This framework integrates three key modules: the Privacy-preserving Augmentation Network (PrivAugNet), the Independent Domain-Specific Transformer with Contrastive Learning (IDST-CL), and the Adaptive Local Differential Privacy (AdaLDP) mechanism, collectively enabling effective privacy-preserving CCTR prediction.
  • Figure 3: Overview of the Privacy-Preserving Augmentation Network (PrivAugNet) for enhancing CCTR prediction with LLM-based augmentation.
  • Figure 4: Illustrations of the three steps of augmentation processes within PrivAugNet.
  • Figure 5: Overview of the IDST-CL Model: Independent Domain-Specific Transformer with Contrastive Learning. The model integrates independent domain-specific transformers for personalized representation learning, combined with contrastive learning techniques to enhance cross-domain knowledge transfer and alignment while preserving domain-specific personalization.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Definition 1: Rényi Differential Privacy mironov2017renyi
  • Lemma 1: RDP of the Gaussian Mechanism mironov2017renyi
  • Lemma 2: Sensitivity of Clipped Gradients
  • Lemma 3: Subsampled Gaussian Mechanism
  • Lemma 4: Conversion from RDP to $(\epsilon, \delta)$-DP