Hierarchies over Vector Space: Orienting Word and Graph Embeddings

Xingzhi Guo; Steven Skiena

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

Xingzhi Guo, Steven Skiena

TL;DR

This work constructs an arborescence by inserting nodes in descending order of entity power, pointing each entity to the closest more powerful node as its parent, and investigates the effect of insertion order, the power/similarity trade-off and various power sources to optimize parent selection.

Abstract

Word and graph embeddings are widely used in deep learning applications. We present a data structure that captures inherent hierarchical properties from an unordered flat embedding space, particularly a sense of direction between pairs of entities. Inspired by the notion of \textit{distributional generality}, our algorithm constructs an arborescence (a directed rooted tree) by inserting nodes in descending order of entity power (e.g., word frequency), pointing each entity to the closest more powerful node as its parent. We evaluate the performance of the resulting tree structures on three tasks: hypernym relation discovery, least-common-ancestor (LCA) discovery among words, and Wikipedia page link recovery. We achieve average 8.98\% and 2.70\% for hypernym and LCA discovery across five languages and 62.76\% accuracy on directed Wiki-page link recovery, with both substantially above baselines. Finally, we investigate the effect of insertion order, the power/similarity trade-off and various power sources to optimize parent selection.

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

TL;DR

Abstract

Paper Structure (13 sections, 3 equations, 6 figures, 4 tables)

This paper contains 13 sections, 3 equations, 6 figures, 4 tables.

Introduction
Related Work
Methods
Experiments
Dataset Description
Detailed Analysis on Wiki-people and WordNet (En)
Edge Accuracy vs. Power/Distance
Edge Accuracy vs. Edge Length
Edge Accuracy vs. Node Power
Multilingual WordNets and PCA-induced Word Power
Least Common Ancestors Discovery
LCA Result Analysis:
Conclusions

Figures (6)

Figure 1: Sub-figure(a): 2D PCA projection of words in GLoVe embedding. Sub-figure (b): Discovered solid edges exist in WordNet, while dot edges do not. Associated edge length reflects the $l^2$ distances between words in embedding space. By finding the directed edges among them, a meaningful hierarchy could be discovered.
Figure 4: We select the vocabularies overlapped with the WordNet, roughly 10K for each language. Word frequency decreases from left to right along horizontal-axis. Concurrently, the $l^2$-norm of the word vector tends to decrease, indicating a strong correlation between word frequency and vector length. The value is smoothed using a window of size 50.
Figure 5: Accuracy as $0 \leq p \leq 1$. Increased weight on distance over power results in trees with more qualified edges. "Syn. Acc." reflects the edges capturing synonym word pairs.
Figure 6: Accuracy by edge distance. Edge with smaller distance (low percentile) is more informative, while those closest words are likely to be synonyms instead of hypernym.
Figure 7: Accuracy by node power. Under the preferred descending insertion order (from high percentile to low percentile), the directed edge accuracy increases and keeps steady.
...and 1 more figures

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

TL;DR

Abstract

Hierarchies over Vector Space: Orienting Word and Graph Embeddings

Authors

TL;DR

Abstract

Table of Contents

Figures (6)