Table of Contents
Fetching ...

Interpreting Node Embedding Distances Through $n$-order Proximity Neighbourhoods

Dougal Shakespeare, Camille Roth

TL;DR

The paper addresses interpreting inter-node distances in embedding spaces by linking them to $n$-hop proximities in a graph. It introduces three proximity networks, $S$, $P$, and $H$, and the Neighbourhood Attraction Score to quantify how embedding distances preserve proximity structure. An interpretability score $I$, computed via histogram-based Jensen–Shannon divergence across proximity classes, enables cross-model comparison of how well distances reflect $1$st-, $2$nd-, and higher-order relations. Across two Deezer co-occurrence networks, the matrix-factorisation model $SVD$ emerges as the most interpretable for inter-node distances, even for higher-order proximities, while DeepWalk and node2vec are less interpretable and SDNE$_S$ can be favorable for higher-order proximities in some cases. These findings inform model choice when interpretability of distances is crucial and highlight potential redundancy in optimizing $1$st-order proximities due to correlations with higher-order relations.

Abstract

In the field of node representation learning the task of interpreting latent dimensions has become a prominent, well-studied research topic. The contribution of this work focuses on appraising the interpretability of another rarely-exploited feature of node embeddings increasingly utilised in recommendation and consumption diversity studies: inter-node embedded distances. Introducing a new method to measure how understandable the distances between nodes are, our work assesses how well the proximity weights derived from a network before embedding relate to the node closeness measurements after embedding. Testing several classical node embedding models, our findings reach a conclusion familiar to practitioners albeit rarely cited in literature - the matrix factorisation model SVD is the most interpretable through 1, 2 and even higher-order proximities.

Interpreting Node Embedding Distances Through $n$-order Proximity Neighbourhoods

TL;DR

The paper addresses interpreting inter-node distances in embedding spaces by linking them to -hop proximities in a graph. It introduces three proximity networks, , , and , and the Neighbourhood Attraction Score to quantify how embedding distances preserve proximity structure. An interpretability score , computed via histogram-based Jensen–Shannon divergence across proximity classes, enables cross-model comparison of how well distances reflect st-, nd-, and higher-order relations. Across two Deezer co-occurrence networks, the matrix-factorisation model emerges as the most interpretable for inter-node distances, even for higher-order proximities, while DeepWalk and node2vec are less interpretable and SDNE can be favorable for higher-order proximities in some cases. These findings inform model choice when interpretability of distances is crucial and highlight potential redundancy in optimizing st-order proximities due to correlations with higher-order relations.

Abstract

In the field of node representation learning the task of interpreting latent dimensions has become a prominent, well-studied research topic. The contribution of this work focuses on appraising the interpretability of another rarely-exploited feature of node embeddings increasingly utilised in recommendation and consumption diversity studies: inter-node embedded distances. Introducing a new method to measure how understandable the distances between nodes are, our work assesses how well the proximity weights derived from a network before embedding relate to the node closeness measurements after embedding. Testing several classical node embedding models, our findings reach a conclusion familiar to practitioners albeit rarely cited in literature - the matrix factorisation model SVD is the most interpretable through 1, 2 and even higher-order proximities.
Paper Structure (13 sections, 11 equations, 2 figures, 2 tables)

This paper contains 13 sections, 11 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Attraction scores for the DeezerSess network before z-normalisation for first-order ($\dot{\delta}_S$), second-order ($\dot{\delta}_P$), higher-order ($\dot{\delta}_H$) and other ($\dot{\delta}_0$) proximities which acts as a control set.
  • Figure 2: Attraction scores for the DeezerPL network before z-normalisation for first-order ($\dot{\delta}_S$), second-order ($\dot{\delta}_P$), higher-order ($\dot{\delta}_H$) and other ($\dot{\delta}_0$) proximities which acts as a control set.