Table of Contents
Fetching ...

Query2GMM: Learning Representation with Gaussian Mixture Model for Reasoning over Knowledge Graphs

Yuhan Wu, Yuanyuan Xu, Wenjie Zhang, Xiwei Xu, Ying Zhang

TL;DR

In Query2GMM, the GMM embedding to represent each query using a univariate Gaussian Mixture Model is presented, allowing for precise representation of multiple subsets, and a new similarity measure is designed to assess the relationships between an entity and a query's multi-answer subsets, enabling effective multi-modal distribution learning for reasoning.

Abstract

Logical query answering over Knowledge Graphs (KGs) is a fundamental yet complex task. A promising approach to achieve this is to embed queries and entities jointly into the same embedding space. Research along this line suggests that using multi-modal distribution to represent answer entities is more suitable than uni-modal distribution, as a single query may contain multiple disjoint answer subsets due to the compositional nature of multi-hop queries and the varying latent semantics of relations. However, existing methods based on multi-modal distribution roughly represent each subset without capturing its accurate cardinality, or even degenerate into uni-modal distribution learning during the reasoning process due to the lack of an effective similarity measure. To better model queries with diversified answers, we propose Query2GMM for answering logical queries over knowledge graphs. In Query2GMM, we present the GMM embedding to represent each query using a univariate Gaussian Mixture Model (GMM). Each subset of a query is encoded by its cardinality, semantic center and dispersion degree, allowing for precise representation of multiple subsets. Then we design specific neural networks for each operator to handle the inherent complexity that comes with multi-modal distribution while alleviating the cascading errors. Last, we design a new similarity measure to assess the relationships between an entity and a query's multi-answer subsets, enabling effective multi-modal distribution learning for reasoning. Comprehensive experimental results show that Query2GMM outperforms the best competitor by an absolute average of $6.35\%$.

Query2GMM: Learning Representation with Gaussian Mixture Model for Reasoning over Knowledge Graphs

TL;DR

In Query2GMM, the GMM embedding to represent each query using a univariate Gaussian Mixture Model is presented, allowing for precise representation of multiple subsets, and a new similarity measure is designed to assess the relationships between an entity and a query's multi-answer subsets, enabling effective multi-modal distribution learning for reasoning.

Abstract

Logical query answering over Knowledge Graphs (KGs) is a fundamental yet complex task. A promising approach to achieve this is to embed queries and entities jointly into the same embedding space. Research along this line suggests that using multi-modal distribution to represent answer entities is more suitable than uni-modal distribution, as a single query may contain multiple disjoint answer subsets due to the compositional nature of multi-hop queries and the varying latent semantics of relations. However, existing methods based on multi-modal distribution roughly represent each subset without capturing its accurate cardinality, or even degenerate into uni-modal distribution learning during the reasoning process due to the lack of an effective similarity measure. To better model queries with diversified answers, we propose Query2GMM for answering logical queries over knowledge graphs. In Query2GMM, we present the GMM embedding to represent each query using a univariate Gaussian Mixture Model (GMM). Each subset of a query is encoded by its cardinality, semantic center and dispersion degree, allowing for precise representation of multiple subsets. Then we design specific neural networks for each operator to handle the inherent complexity that comes with multi-modal distribution while alleviating the cascading errors. Last, we design a new similarity measure to assess the relationships between an entity and a query's multi-answer subsets, enabling effective multi-modal distribution learning for reasoning. Comprehensive experimental results show that Query2GMM outperforms the best competitor by an absolute average of .
Paper Structure (19 sections, 1 theorem, 14 equations, 4 figures, 2 tables)

This paper contains 19 sections, 1 theorem, 14 equations, 4 figures, 2 tables.

Key Result

Theorem 1

Given two univariate Gaussian mixture distributions $p_1 = \sum_{i=1}^{m}{\alpha}_i^1{\phi}({\mu}_i^1, {\sigma}_i^1)$ and $p_2 = \sum_{j=1}^{n}{\alpha}_j^2{\phi}({\mu}_j^2, {\sigma}_j^2)$. Let $w_{ij}$ be the Wasserstein distance between Gaussian components $p_{1i}$ and $p_{2j}$. Let $f_{ij}$ be the

Figures (4)

  • Figure 1: (a) Visualization of the answer distribution of the query "Who has been nominated for Emmy Award?" (b) Illustration of reasoning of Query2GMM on the query "What are the movies starring non-American actors who have been nominated for Emmy Award?" ($2$-modal distribution).
  • Figure 2: Visualization of logical operator transformation. The input embeddings are represented in light blue and green colors, while the output is shown in deep blue.
  • Figure 3: Comparison of average MRR scores on EPFO queries across two datasets using different GMM components.
  • Figure 4: Query structures with the abbreviation of their computation graph used in the experiments, where 'p', 'i', 'u', and 'n' represent 'projection', 'intersection', 'union', and 'negation'.

Theorems & Definitions (1)

  • Theorem 1