DSparsE: Dynamic Sparse Embedding for Knowledge Graph Completion
Chuhong Yang, Bin Li, Nan Wu
TL;DR
DSparsE tackles knowledge graph completion by addressing overfitting and limited feature interaction through a dynamic sparse encoder (dynamic layer + relation-aware layer) and a deep residual decoder. All dense layers are replaced with sparse MLPs to preserve expressivity while reducing parameter count, with a gating mechanism enabling expert-style, input-dependent fusion. Empirical results on FB15k-237, WN18RR, and YAGO3-10 show state-of-the-art or competitive performance, and ablations confirm the critical roles of the encoder components, residual decoding, and sparsity settings. The approach also reveals semantic clustering in gating outputs, suggesting meaningful organization of entity-relation patterns and robust scalability to deeper architectures.
Abstract
Addressing the incompleteness problem in knowledge graph remains a significant challenge. Current knowledge graph completion methods have their limitations. For example, ComDensE is prone to overfitting and suffers from the degradation with the increase of network depth while InteractE has the limitations in feature interaction and interpretability. To this end, we propose a new method called dynamic sparse embedding (DSparsE) for knowledge graph completion. The proposed model embeds the input entity-relation pairs by a shallow encoder composed of a dynamic layer and a relation-aware layer. Subsequently, the concatenated output of the dynamic layer and relation-aware layer is passed through a projection layer and a deep decoder with residual connection structure. This model ensures the network robustness and maintains the capability of feature extraction. Furthermore, the conventional dense layers are replaced by randomly initialized sparse connection layers in the proposed method, which can mitigate the model overfitting. Finally, comprehensive experiments are conducted on the datasets of FB15k-237, WN18RR and YAGO3-10. It was demonstrated that the proposed method achieves the state-of-the-art performance in terms of Hits@1 compared to the existing baseline approaches. An ablation study is performed to examine the effects of the dynamic layer and relation-aware layer, where the combined model achieves the best performance.
