Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation
Xiaoxiong Zhang, Zhiwei Zeng, Xin Zhou, Chunyan Miao
TL;DR
This work tackles the challenge of deploying efficient knowledge graph embeddings in federated settings by introducing FedKD, a knowledge-distillation based component that transfers knowledge from a high-dimensional teacher to a low-dimensional student during client local training. FedKD adds a soft-label KD loss via KL divergence and employs Adaptive Asymmetric Temperature Scaling to mitigate teacher over-confidence, while dynamically balancing hard and soft losses across training. Evaluated by applying FedKD to the FedE FKGE framework across FB-R3, FB-R5, and FB-R10 with TransE, RotatE, and ComplEx, the approach achieves roughly half the embedding size (256 to 128) with minimal or no loss in performance, and even notable gains in some cases. The methodology enables practical, privacy-preserving FKGE deployment on resource-constrained devices by reducing communication and storage burdens without sacrificing accuracy.
Abstract
Federated Knowledge Graph Embedding (FKGE) aims to facilitate collaborative learning of entity and relation embeddings from distributed Knowledge Graphs (KGs) across multiple clients, while preserving data privacy. Training FKGE models with higher dimensions is typically favored due to their potential for achieving superior performance. However, high-dimensional embeddings present significant challenges in terms of storage resource and inference speed. Unlike traditional KG embedding methods, FKGE involves multiple client-server communication rounds, where communication efficiency is critical. Existing embedding compression methods for traditional KGs may not be directly applicable to FKGE as they often require multiple model trainings which potentially incur substantial communication costs. In this paper, we propose a light-weight component based on Knowledge Distillation (KD) which is titled FedKD and tailored specifically for FKGE methods. During client-side local training, FedKD facilitates the low-dimensional student model to mimic the score distribution of triples from the high-dimensional teacher model using KL divergence loss. Unlike traditional KD way, FedKD adaptively learns a temperature to scale the score of positive triples and separately adjusts the scores of corresponding negative triples using a predefined temperature, thereby mitigating teacher over-confidence issue. Furthermore, we dynamically adjust the weight of KD loss to optimize the training process. Extensive experiments on three datasets support the effectiveness of FedKD.
