Role extraction by matrix equations and generalized random walks
Dario Fasino
TL;DR
The paper addresses the problem of extracting node roles in directed networks with heterogeneous degree distributions by introducing a direction-aware node similarity $S^*$ as the solution to $S - \frac{\beta^2}{2}(P S P^T + Q S Q^T) = S_1$, where $P = D_{out}^{-1}A$, $Q = D_{in}^{-1}A^T$, and $S_1 = P P^T + Q Q^T$. It develops a globally convergent iteration to compute $S^*$, interprets the entries via generalized random $\Psi$-walks, and analyzes the limiting behavior as $\beta$ varies, including a universal limit near $\beta^2=1$ and a baseline near $\beta=0$. A key theoretical result shows an invariance of $S^*$ under degree corrections in degree-corrected SBM settings: for $A = D_1\Theta B\Theta^T D_2$, $S^* = \Theta X \Theta^T$ with $X$ solving a reduced matrix equation; this yields block-constant structure and rank equal to the number of blocks, implying robust role recovery. Numerical experiments on synthetic SBMs and a real faculty-hiring network illustrate that $S^*$ reliably identifies roles in directed graphs with heterogeneous degrees, outperforming the traditional Browet–Van Dooren style similarity in the presence of degree variation. The work also discusses extensions to weighted networks and practical considerations, including computational cost and potential low-rank approximations for large-scale graphs.
Abstract
The nodes in a network can be grouped into 'roles' based on similar connection patterns. This is usually achieved by defining a pairwise node similarity matrix and then clustering rows and columns of this matrix. This paper presents a new similarity matrix for solving role extraction problems in directed networks, which is defined as the solution of a matrix equation and computes node similarities based on random walks that can proceed along the link direction and in the opposite direction. The resulting node similarity measure performs remarkably in role extraction tasks on directed networks with heterogeneous node degree distributions.
