Table of Contents
Fetching ...

Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding

Mauricio Soto-Gomez, Peter Robinson, Carlos Cano, Ali Pashaeibarough, Emanuele Cavalleri, Justin Reese, Marco Mesiti, Giorgio Valentini, Elena Casiraghi

TL;DR

Het-node2vec is proposed, an extension of the node2vec algorithm, designed for embedding heterogeneous graphs, by introducing a simple stochastic node and edge type switching strategy in second order random walk processes, and introduces an ''attention mechanism'' to focus the random walks on specific node and edge types, allowing more accurate embeddings and more focused predictions on specific node and edge types of interest.

Abstract

Many real-world problems are naturally modeled as heterogeneous graphs, where nodes and edges represent multiple types of entities and relations. Existing learning models for heterogeneous graph representation usually depend on the computation of specific and user-defined heterogeneous paths, or in the application of large and often not scalable deep neural network architectures. We propose Het-node2vec, an extension of the node2vec algorithm, designed for embedding heterogeneous graphs. Het-node2vec addresses the challenge of capturing the topological and structural characteristics of graphs and the semantic information underlying the different types of nodes and edges of heterogeneous graphs, by introducing a simple stochastic node and edge type switching strategy in second order random walk processes. The proposed approach also introduces an ''attention mechanism'' to focus the random walks on specific node and edge types, thus allowing more accurate embeddings and more focused predictions on specific node and edge types of interest. Empirical results on benchmark datasets show that Hetnode2vec achieves comparable or superior performance with respect to state-of-the-art methods for heterogeneous graphs in node label and edge prediction tasks.

Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding

TL;DR

Het-node2vec is proposed, an extension of the node2vec algorithm, designed for embedding heterogeneous graphs, by introducing a simple stochastic node and edge type switching strategy in second order random walk processes, and introduces an ''attention mechanism'' to focus the random walks on specific node and edge types, allowing more accurate embeddings and more focused predictions on specific node and edge types of interest.

Abstract

Many real-world problems are naturally modeled as heterogeneous graphs, where nodes and edges represent multiple types of entities and relations. Existing learning models for heterogeneous graph representation usually depend on the computation of specific and user-defined heterogeneous paths, or in the application of large and often not scalable deep neural network architectures. We propose Het-node2vec, an extension of the node2vec algorithm, designed for embedding heterogeneous graphs. Het-node2vec addresses the challenge of capturing the topological and structural characteristics of graphs and the semantic information underlying the different types of nodes and edges of heterogeneous graphs, by introducing a simple stochastic node and edge type switching strategy in second order random walk processes. The proposed approach also introduces an ''attention mechanism'' to focus the random walks on specific node and edge types, thus allowing more accurate embeddings and more focused predictions on specific node and edge types of interest. Empirical results on benchmark datasets show that Hetnode2vec achieves comparable or superior performance with respect to state-of-the-art methods for heterogeneous graphs in node label and edge prediction tasks.

Paper Structure

This paper contains 31 sections, 14 equations, 15 figures, 7 tables.

Figures (15)

  • Figure 1: A step (at time $t$) of a second order RW in a homogeneous graph. At time $t-1$ the random walk was in $X_{t-1}= r$, and has just moved from node $r$ to node $v$. Then, the probability of moving from $v$ to any nearest-neighbor is proportional to $\alpha_{pq}\cdot w_{vx}$, where $w_{vx}$ denotes the weight of edge $vx$, and $\alpha_{pq}$ depends on a return parameter $p$ and on an outward parameter $q$.
  • Figure 2: A heterogeneous multigraph with nodes and edges of different types. Different colors are used to represent node and edge types. Multiple types of edges may connect the same pair of nodes.
  • Figure 3: Unnormalized transition probabilities for Het-node2vec in unweighted heterogeneous networks with: (a) heterogeneous nodes and homogeneous edges, i.e. $c=1$; (b) heterogeneous edges and homogeneous nodes, i.e. $s=1$; (c) both nodes and edges are heterogeneous. The color of nodes and the line-style (dashed or continuous) of edges represent their type. To simplify the notation the edges are unweighted and nodes $x$ with type $\phi_*$ are denoted as $x_{\phi_*}$. Edge labels indicate the value of the function $\Phi_{sc}\cdot \alpha_{pq}$ (without considering edge weights) for a second order RW starting from $v_{\phi_1}$, i.e. $X_t = v_{\phi_1}$ and coming from node $r$, i.e. $X_{t-1} = r_{\phi_1}$ or $X_{t-1} = r_{\phi_2}$.
  • Figure 4: Definition and implementation of special node-type switching strategies. (\ref{['fig:special_switching1']}.1, \ref{['fig:special_switching2']}.1): When the RW starts from a non-special node, the two strategies bias the transition probability in the same way: both promote/demote transitions towards special node types. (\ref{['fig:special_switching1']}.2): When starting from a special node, the first special node-type switching strategy promotes/demotes the transition toward another special node, independently of the type of the preceding node in the RW. (\ref{['fig:special_switching2']}.2): When starting from a special node type, the second special node-type switching strategy does not bias the choice of the next node; in this way, the RW gains a broader knowledge about the types of its neighborhoods, including also non special nodes. (\ref{['fig:special_switching_implementation1']}, \ref{['fig:special_switching_implementation2']}): Each node-type switching function is implemented so that it only depends on the edge used in the transition and not on the direction of the RW. In the case of Figure \ref{['fig:special_switching_implementation1']}, the bias applied to the edge connecting $x_0$ and $x_1$ is $1/s^2$, which is $1/s$ times lower than the bias applied to all other outbound edges, which are set to $1/s$. After normalizing these biases to ensure they sum to 1, the probability of switching from $x_0$ to $x_1$ will be $1/s$ times larger (if $s < 1$) or lower (if $s > 1$) than the probability of switching to any other neighbor, according to the Het-node2vec design. In the case of 1, the bias applied to all the outgoing edges (from $x_0$ to each of the neighbors) is constant so that the switching probability resulting after normalizing the switching weights will not promote or demote any edge.
  • Figure 5: Computation time evolution of the Het-node2vec representation in Erdős-Rényi graphs with an average degree of 10 and random power-law graphs with exponent $\alpha=2.2$. For each family, graphs nodes and edges were randomly assigned to one of ten classes. Starting from each node, 10 random walks of length 100 were generated using the parameters $p=0.25, q=4$, and $s=0.5$. The set of random walk was used as the input of a Skipgram model to produce a vectorial embedding of size 100. Computations have been performed using a processor AMD Rome 7452, 2.3 GHz, 32 cores with a RAM of 1024GB.
  • ...and 10 more figures