Interpreting Node Embedding Distances Through $n$-order Proximity Neighbourhoods
Dougal Shakespeare, Camille Roth
TL;DR
The paper addresses interpreting inter-node distances in embedding spaces by linking them to $n$-hop proximities in a graph. It introduces three proximity networks, $S$, $P$, and $H$, and the Neighbourhood Attraction Score to quantify how embedding distances preserve proximity structure. An interpretability score $I$, computed via histogram-based Jensen–Shannon divergence across proximity classes, enables cross-model comparison of how well distances reflect $1$st-, $2$nd-, and higher-order relations. Across two Deezer co-occurrence networks, the matrix-factorisation model $SVD$ emerges as the most interpretable for inter-node distances, even for higher-order proximities, while DeepWalk and node2vec are less interpretable and SDNE$_S$ can be favorable for higher-order proximities in some cases. These findings inform model choice when interpretability of distances is crucial and highlight potential redundancy in optimizing $1$st-order proximities due to correlations with higher-order relations.
Abstract
In the field of node representation learning the task of interpreting latent dimensions has become a prominent, well-studied research topic. The contribution of this work focuses on appraising the interpretability of another rarely-exploited feature of node embeddings increasingly utilised in recommendation and consumption diversity studies: inter-node embedded distances. Introducing a new method to measure how understandable the distances between nodes are, our work assesses how well the proximity weights derived from a network before embedding relate to the node closeness measurements after embedding. Testing several classical node embedding models, our findings reach a conclusion familiar to practitioners albeit rarely cited in literature - the matrix factorisation model SVD is the most interpretable through 1, 2 and even higher-order proximities.
