Table of Contents
Fetching ...

Distributed Representations of Entities in Open-World Knowledge Graphs

Lingbing Guo, Zhuo Chen, Jiaoyan Chen, Yichi Zhang, Zequn Sun, Zhongpo Bo, Yin Fang, Xiaoze Liu, Huajun Chen, Wen Zhang

TL;DR

This work tackles open-world knowledge graphs where new entities continually emerge. It introduces Decentralized Attention Network (DAN), which distributes relational information over neighbor embeddings and uses neighbor context as the query in a second-order attention mechanism, reducing dependence on self-embeddings and enabling induction for unseen entities. A self-distillation objective is proposed to align input and decentralized embeddings via mutual information, supported by theoretical results. Empirically, the approach, implemented in the decentRL framework, achieves state-of-the-art results on conventional entity alignment and entity prediction benchmarks and shows strong gains in open-world settings, with good efficiency and generality across GNN backbones.

Abstract

Graph neural network (GNN)-based methods have demonstrated remarkable performance in various knowledge graph (KG) tasks. However, most existing approaches rely on observing all entities during training, posing a challenge in real-world knowledge graphs where new entities emerge frequently. To address this limitation, we introduce Decentralized Attention Network (DAN). DAN leverages neighbor context as the query vector to score the neighbors of an entity, thereby distributing the entity semantics only among its neighbor embeddings. To effectively train a DAN, we introduce self-distillation, a technique that guides the network in generating desired representations. Theoretical analysis validates the effectiveness of our approach. We implement an end-to-end framework and conduct extensive experiments to evaluate our method, showcasing competitive performance on conventional entity alignment and entity prediction tasks. Furthermore, our method significantly outperforms existing methods in open-world settings.

Distributed Representations of Entities in Open-World Knowledge Graphs

TL;DR

This work tackles open-world knowledge graphs where new entities continually emerge. It introduces Decentralized Attention Network (DAN), which distributes relational information over neighbor embeddings and uses neighbor context as the query in a second-order attention mechanism, reducing dependence on self-embeddings and enabling induction for unseen entities. A self-distillation objective is proposed to align input and decentralized embeddings via mutual information, supported by theoretical results. Empirically, the approach, implemented in the decentRL framework, achieves state-of-the-art results on conventional entity alignment and entity prediction benchmarks and shows strong gains in open-world settings, with good efficiency and generality across GNN backbones.

Abstract

Graph neural network (GNN)-based methods have demonstrated remarkable performance in various knowledge graph (KG) tasks. However, most existing approaches rely on observing all entities during training, posing a challenge in real-world knowledge graphs where new entities emerge frequently. To address this limitation, we introduce Decentralized Attention Network (DAN). DAN leverages neighbor context as the query vector to score the neighbors of an entity, thereby distributing the entity semantics only among its neighbor embeddings. To effectively train a DAN, we introduce self-distillation, a technique that guides the network in generating desired representations. Theoretical analysis validates the effectiveness of our approach. We implement an end-to-end framework and conduct extensive experiments to evaluate our method, showcasing competitive performance on conventional entity alignment and entity prediction tasks. Furthermore, our method significantly outperforms existing methods in open-world settings.

Paper Structure

This paper contains 42 sections, 2 theorems, 18 equations, 8 figures, 12 tables, 1 algorithm.

Key Result

Proposition 4.1

In self-distillation, the mutual information between $\mathbf{e}_i$ and $\mathbf{g}_i$ is lower-bounded as follows:

Figures (8)

  • Figure 1: A comparison between KG embedding and word embedding. Left: the KG and the sentence contain the same information. Center: the triplet-based models are similar to Skip-gram where each neighbor embedding is used to predict the central element. Right: the GNN-based models resemble CBOW where all neighbor embeddings are aggregated to represent the central element, with the exception that CBOW does not uses the self-entity embedding.
  • Figure 2: Insight into multi-layer DAN. a. In the single-layer DAN, we first use an additional aggregation layer to obtain the neighbor context (1-2); we then use the neighbor context as query to score neighbors (3); we finally aggregate the neighbors with the attention scores to obtain the final output embedding (4-5). b. In the multi-layer DAN, we first use the output embedding of W3C at layer $k-1$ as query to score the output embedding of its neighbors at layer $k-2$ (1); we then aggregate the neighbor embeddings at layer $k-2$ with the attention scores to obtain the output embedding of W3C at layer $k$ (2-3); similarly, we use the output embedding of W3C at layer $k$ as query to score the output embedding of its neighbors at layer $k-1$, and finally use the attention scores to aggregate the neighbor embeddings at layer $k-1$ to obtain the output embedding of W3C at layer $k+1$ (4-6).
  • Figure 3: An illustration of self-distillation. The yellow and orange cells denote the input and decentralized embeddings, respectively.
  • Figure 4: Entity alignment results on open-world DBP15K. Bars with dotted lines denote the performance drop compared with the corresponding results on the conventional datasets.
  • Figure 5: MRR results on open-world FB15K-237, with TransE as the decoder.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Proposition 4.1: lower bound
  • Proposition 4.2: optimization
  • proof
  • proof
  • proof