Table of Contents
Fetching ...

Hierarchical Blockmodelling for Knowledge Graphs

Marcin Pietrasik, Marek Reformat, Anna Wilbik

TL;DR

This paper proposes a model leveraging the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model and derive a collapsed Gibbs sampling scheme for its inference, which is capable of inducing coherent cluster hierarchies in small scale settings.

Abstract

In this paper, we investigate the use of probabilistic graphical models, specifically stochastic blockmodels, for the purpose of hierarchical entity clustering on knowledge graphs. These models, seldom used in the Semantic Web community, decompose a graph into a set of probability distributions. The parameters of these distributions are then inferred allowing for their subsequent sampling to generate a random graph. In a non-parametric setting, this allows for the induction of hierarchical clusterings without prior constraints on the hierarchy's structure. Specifically, this is achieved by the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model. In this regard, we propose a model leveraging such integration and derive a collapsed Gibbs sampling scheme for its inference. To aid in understanding, we describe the steps in this derivation and provide an implementation for the sampler. We evaluate our model on synthetic and real-world datasets and quantitatively compare against benchmark models. We further evaluate our results qualitatively and find that our model is capable of inducing coherent cluster hierarchies in small scale settings. The work presented in this paper provides the first step for the further application of stochastic blockmodels for knowledge graphs on a larger scale. We conclude the paper with potential avenues for future work on more scalable inference schemes.

Hierarchical Blockmodelling for Knowledge Graphs

TL;DR

This paper proposes a model leveraging the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model and derive a collapsed Gibbs sampling scheme for its inference, which is capable of inducing coherent cluster hierarchies in small scale settings.

Abstract

In this paper, we investigate the use of probabilistic graphical models, specifically stochastic blockmodels, for the purpose of hierarchical entity clustering on knowledge graphs. These models, seldom used in the Semantic Web community, decompose a graph into a set of probability distributions. The parameters of these distributions are then inferred allowing for their subsequent sampling to generate a random graph. In a non-parametric setting, this allows for the induction of hierarchical clusterings without prior constraints on the hierarchy's structure. Specifically, this is achieved by the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model. In this regard, we propose a model leveraging such integration and derive a collapsed Gibbs sampling scheme for its inference. To aid in understanding, we describe the steps in this derivation and provide an implementation for the sampler. We evaluate our model on synthetic and real-world datasets and quantitatively compare against benchmark models. We further evaluate our results qualitatively and find that our model is capable of inducing coherent cluster hierarchies in small scale settings. The work presented in this paper provides the first step for the further application of stochastic blockmodels for knowledge graphs on a larger scale. We conclude the paper with potential avenues for future work on more scalable inference schemes.
Paper Structure (33 sections, 35 equations, 11 figures, 2 tables, 1 algorithm)

This paper contains 33 sections, 35 equations, 11 figures, 2 tables, 1 algorithm.

Figures (11)

  • Figure 1: Toy example of a knowledge graph and how it may be modelled by a stochastic blockmodel. Starting from top left quadrant and proceeding clockwise: graphical representation of a knowledge graph with entities $e_0$ through $e_7$ and predicates $r_0$ through $r_2$; graphical representation of aforementioned knowledge graph as modelled by a stochastic blockmodel with communities $t_0$ through $t_2$; potential community relations tensor induced by stochastic blockmodel; adjacency tensor of knowledge graph above it.
  • Figure 2: Toy example of the CRP after sitting patrons $e_0$ through $e_5$. Tables $t_0$ through $t_2$ are occupied and table $t_3$ is the next unoccupied table. We illustrate Equation \ref{['eqn:crp']} by calculating the probabilities of sitting patron $p_6$ at tables $t_0$ and $t_3$: $\mathbbm{P}(e_6 = t_0) = \frac{3}{6 + \gamma}$ and $\mathbbm{P}(e_6 = t_2) = \frac{\gamma}{6 + \gamma}$.
  • Figure 3: Toy example of a nCRP truncated to a depth of $L = 2$ after assigning patrons $e_0$ through $e_5$. Solid lines indicate paths which have been taken by patrons and thus exist in the tree whereas dashed lines indicate indicate potential paths. We illustrate Equation \ref{['eqn:ncrp']} by calculating the probability of a patron taking a path through communities $t_2$ and $t_9$: $\mathbbm{P}(e_6 = t_2) = (\frac{2}{2 + \gamma})( \frac{2}{6 + \gamma})$ and $\mathbbm{P}(e_6 = t_9) = \frac{\gamma}{6 + \gamma}$.
  • Figure 4: Toy example of the stick breaking process with values $v^1 = 0.125$$v^2 = 0.25$$v^3 = 0.5$. Starting at the top of the figure, a unit length stick is broken at $v^1$. The remainder is then iteratively broken proportionally to draws from the Beta distribution.
  • Figure 5: Toy example depicting a potential hierarchy induced by our model. The table on the right side captures the path and level sampled for each entity in the knowledge graph as well as its corresponding community. The left side provides a visualization of this hierarchy.
  • ...and 6 more figures