Two Layer Walk: A Community-Aware Graph Embedding
He Yu, Jing Liu
TL;DR
TLWalk addresses locality bias in traditional random-walk graph embeddings by introducing a two-layer, community-aware walk that separately samples within dense communities and across bridging nodes. It formalizes this via two transition matrices, $M_I$ for intra-community and $M_C$ for inter-community dynamics, and reveals a matrix-factorization interpretation through a shifted PMI that combines both layers. Theoretical analysis links TLWalk to overcome locality bias and to a joint matrix factorization, while empirical results show improvements in link prediction, node clustering, and classification across diverse networks, including LFR benchmarks, with strong robustness and scalability. Overall, TLWalk provides a parameter-free, scalable framework for capturing mesoscopic community structure in graph embeddings, with practical implications for social, biological, and ecological network analysis.
Abstract
Community structures are critical for understanding the mesoscopic organization of networks, bridging local and global patterns. While methods such as DeepWalk and node2vec capture local positional information through random walks, they fail to preserve community structures. Other approaches like modularized nonnegative matrix factorization and evolutionary algorithms address this gap but are computationally expensive and unsuitable for large-scale networks. To overcome these limitations, we propose Two Layer Walk (TLWalk), a novel graph embedding algorithm that incorporates hierarchical community structures. TLWalk balances intra- and inter-community relationships through a community-aware random walk mechanism without requiring additional parameters. Theoretical analysis demonstrates that TLWalk effectively mitigates locality bias. Experiments on benchmark datasets show that TLWalk outperforms state-of-the-art methods, achieving up to 3.2% accuracy gains for link prediction tasks. By encoding dense local and sparse global structures, TLWalk proves robust and scalable across diverse networks, offering an efficient solution for network analysis.
