Table of Contents
Fetching ...

Embedding networks with the random walk first return time distribution

Vedanta Thapar, Renaud Lambiotte, George T. Cantwell

TL;DR

This paper introduces the first return time distribution (FRTD) of a random walk as a principled node embedding, assigning to each node a normalized discrete distribution that encodes structural role. It develops a formal theory connecting FRTD to spectral properties, defines FRTD-equivalence and distances, and demonstrates that FRTD captures richer structure than eigen spectra while lying between cospectrality and isomorphism. Through empirical studies on role extraction, graph alignment, and network randomization, the authors show that FRTD-based embeddings reveal functional node roles, improve alignment when used in conjunction with existing methods, and enable realistic random graph generation preserving high-order structure. The work further extends FRTD to directed/weighted graphs and discusses scalability, limitations, and future directions, including efficient estimation and potential decoding challenges. Overall, FRTD offers a compact, interpretable, and mathematically grounded embedding that complements traditional metrics and diffusion-based approaches for complex networks.

Abstract

We propose the first return time distribution (FRTD) of a random walk as an interpretable and mathematically grounded node embedding. The FRTD assigns a probability mass function to each node, allowing us to define a distance between any pair of nodes using standard metrics for discrete distributions. We present several arguments to motivate the FRTD embedding. First, we show that FRTDs are strictly more informative than eigenvalue spectra, yet insufficient for complete graph identification, thus placing FRTD equivalence between cospectrality and isomorphism. Second, we argue that FRTD equivalence between nodes captures structural similarity. Third, we empirically demonstrate that the FRTD embedding outperforms manually designed graph metrics in network alignment tasks. Finally, we show that random networks that approximately match the FRTD of a desired target also preserve other salient features. Together these results demonstrate the FRTD as a simple and mathematically principled embedding for complex networks.

Embedding networks with the random walk first return time distribution

TL;DR

This paper introduces the first return time distribution (FRTD) of a random walk as a principled node embedding, assigning to each node a normalized discrete distribution that encodes structural role. It develops a formal theory connecting FRTD to spectral properties, defines FRTD-equivalence and distances, and demonstrates that FRTD captures richer structure than eigen spectra while lying between cospectrality and isomorphism. Through empirical studies on role extraction, graph alignment, and network randomization, the authors show that FRTD-based embeddings reveal functional node roles, improve alignment when used in conjunction with existing methods, and enable realistic random graph generation preserving high-order structure. The work further extends FRTD to directed/weighted graphs and discusses scalability, limitations, and future directions, including efficient estimation and potential decoding challenges. Overall, FRTD offers a compact, interpretable, and mathematically grounded embedding that complements traditional metrics and diffusion-based approaches for complex networks.

Abstract

We propose the first return time distribution (FRTD) of a random walk as an interpretable and mathematically grounded node embedding. The FRTD assigns a probability mass function to each node, allowing us to define a distance between any pair of nodes using standard metrics for discrete distributions. We present several arguments to motivate the FRTD embedding. First, we show that FRTDs are strictly more informative than eigenvalue spectra, yet insufficient for complete graph identification, thus placing FRTD equivalence between cospectrality and isomorphism. Second, we argue that FRTD equivalence between nodes captures structural similarity. Third, we empirically demonstrate that the FRTD embedding outperforms manually designed graph metrics in network alignment tasks. Finally, we show that random networks that approximately match the FRTD of a desired target also preserve other salient features. Together these results demonstrate the FRTD as a simple and mathematically principled embedding for complex networks.

Paper Structure

This paper contains 20 sections, 3 theorems, 15 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Lemma 2.3

For an undirected graph with normalized adjacency matrix $\mathbf{X}$, let $\{ \lambda_\alpha \}_{\alpha=1}^{N}$ be the $N$ distinct eigenvalues of $\mathbf{X}$ with multiplicities $\{ N_\alpha \}_{\alpha=1}^{N}$ and corresponding orthonormal eigenvectors $\{ \boldsymbol{\psi}_{\alpha \beta} \}_{\be

Figures (6)

  • Figure 1: In panel (a) we show the Frucht graph Frucht_1949. The two red nodes are not automorphically equivalent but share the same first return time distributions. In (b) and (c) we give the smallest example of two non-isomorphic graphs that are FRTD equivalent, the nodes are colored according to FRTD equivalence across the two graphs.
  • Figure 1: Principal component visualization of different embedding methods for the Barbell graph. Embeddings are calculated with the default parameters for all the algorithms. Node2vec is a proximity based embedding method and places nodes that are close to each other in the graph close in the embedding space. The other methods are structural (role) based proximity. Notably struc2vec does not tend to identify automorphically equivalent nodes with the same embeddings. In contrast, GraphWave assigns structurally equivalent nodes to arbitrarily close embeddings (with appropriate hyper-parameter selection) and FRTD embeddings are always exactly equivalent for automorphic nodes.
  • Figure 1: Graph statistics for the directed C. elegans connectome compared with those obtained from $P_G(G')$ at different temperatures, note that $\beta=0$ corresponds to sampling all graphs with the same in- and out-degree sequence with equal probability. Panel (a) shows the median correlation between various node level descriptors of samples $G'$ and $G$, (b) displays the ratio of various global network descriptors for $G'$ vs $G$.
  • Figure 2: Clustering three example networks, namely the airports, Lord of the Rings, and C. elegans networks, described in Sec. \ref{['sec:role extraction']} We show clusters found from the FRTD, adjacency matrix, and the Louvain algorithm. Nodes in the airports network are placed by latitude and longitude, the other two networks are drawn using the Fruchterman-Reingold force-directed algorithm spring_layout. While the FRTD clusters pick out structural roles---nodes with similar structural positions in the network---the adjacency and Louvain clusters pick out nodes that are proximate in the network.
  • Figure 3: Performance of three alignment algorithms on a set of 18 undirected network datasets ordered by number of edges---see Appendix \ref{['GAL datasets']} for details. For the first 15 graphs, the noisy copy is created by randomly removing $5\%$ of edges and permuting the nodes. For the last three networks, "real" noise is used (see Appendix \ref{['GAL datasets']} for details). All experiments were run with the same hyperparameters ($\mu=1$) and no optimization of hyperparameters was considered.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Definition 2.1: FRTD equivalent nodes
  • Definition 2.2: FRTD equivalent graphs
  • Lemma 2.3
  • Proof 1
  • Theorem 2.4
  • Proof 2
  • Corollary 2.5