Federated Cross-Domain Click-Through Rate Prediction With Large Language Model Augmentation
Jiangcheng Qin, Xueyuan Zhang, Baisong Liu, Jiangbo Qian, Yangyang Wang
TL;DR
FedCCTR-LM tackles privacy-preserving cross-domain CTR prediction in heterogeneous, data-sparse environments by combining three core innovations. PrivAugNet uses LLMs to augment items, users, and sequences, mitigating sparsity and aligning cross-domain feature spaces without centralizing raw data. IDST-CL employs independent domain-specific transformers with Intra-Domain Representation Alignment (IDRA) and Cross-Domain Representation Disentanglement (CDRD) to balance domain personalization with cross-domain knowledge transfer, aided by contrastive learning. AdaLDP dynamically tunes gradient-level noise to maintain a favorable privacy-utility trade-off during federated training. Empirical results on four real-world datasets show that FedCCTR-LM surpasses traditional, cross-domain, and federated baselines in CTR accuracy while preserving privacy, with ablations validating the contribution of each module and sensitivity analyses guiding practical deployment.
Abstract
Accurately predicting click-through rates (CTR) under stringent privacy constraints poses profound challenges, particularly when user-item interactions are sparse and fragmented across domains. Conventional cross-domain CTR (CCTR) methods frequently assume homogeneous feature spaces and rely on centralized data sharing, neglecting complex inter-domain discrepancies and the subtle trade-offs imposed by privacy-preserving protocols. Here, we present Federated Cross-Domain CTR Prediction with Large Language Model Augmentation (FedCCTR-LM), a federated framework engineered to address these limitations by synchronizing data augmentation, representation disentanglement, and adaptive privacy protection. Our approach integrates three core innovations. First, the Privacy-Preserving Augmentation Network (PrivAugNet) employs large language models to enrich user and item representations and expand interaction sequences, mitigating data sparsity and feature incompleteness. Second, the Independent Domain-Specific Transformer with Contrastive Learning (IDST-CL) module disentangles domain-specific and shared user preferences, employing intra-domain representation alignment (IDRA) and crossdomain representation disentanglement (CDRD) to refine the learned embeddings and enhance knowledge transfer across domains. Finally, the Adaptive Local Differential Privacy (AdaLDP) mechanism dynamically calibrates noise injection to achieve an optimal balance between rigorous privacy guarantees and predictive accuracy. Empirical evaluations on four real-world datasets demonstrate that FedCCTR-LM substantially outperforms existing baselines, offering robust, privacy-preserving, and generalizable cross-domain CTR prediction in heterogeneous, federated environments.
