Table of Contents
Fetching ...

Two Layer Walk: A Community-Aware Graph Embedding

He Yu, Jing Liu

TL;DR

TLWalk addresses locality bias in traditional random-walk graph embeddings by introducing a two-layer, community-aware walk that separately samples within dense communities and across bridging nodes. It formalizes this via two transition matrices, $M_I$ for intra-community and $M_C$ for inter-community dynamics, and reveals a matrix-factorization interpretation through a shifted PMI that combines both layers. Theoretical analysis links TLWalk to overcome locality bias and to a joint matrix factorization, while empirical results show improvements in link prediction, node clustering, and classification across diverse networks, including LFR benchmarks, with strong robustness and scalability. Overall, TLWalk provides a parameter-free, scalable framework for capturing mesoscopic community structure in graph embeddings, with practical implications for social, biological, and ecological network analysis.

Abstract

Community structures are critical for understanding the mesoscopic organization of networks, bridging local and global patterns. While methods such as DeepWalk and node2vec capture local positional information through random walks, they fail to preserve community structures. Other approaches like modularized nonnegative matrix factorization and evolutionary algorithms address this gap but are computationally expensive and unsuitable for large-scale networks. To overcome these limitations, we propose Two Layer Walk (TLWalk), a novel graph embedding algorithm that incorporates hierarchical community structures. TLWalk balances intra- and inter-community relationships through a community-aware random walk mechanism without requiring additional parameters. Theoretical analysis demonstrates that TLWalk effectively mitigates locality bias. Experiments on benchmark datasets show that TLWalk outperforms state-of-the-art methods, achieving up to 3.2% accuracy gains for link prediction tasks. By encoding dense local and sparse global structures, TLWalk proves robust and scalable across diverse networks, offering an efficient solution for network analysis.

Two Layer Walk: A Community-Aware Graph Embedding

TL;DR

TLWalk addresses locality bias in traditional random-walk graph embeddings by introducing a two-layer, community-aware walk that separately samples within dense communities and across bridging nodes. It formalizes this via two transition matrices, for intra-community and for inter-community dynamics, and reveals a matrix-factorization interpretation through a shifted PMI that combines both layers. Theoretical analysis links TLWalk to overcome locality bias and to a joint matrix factorization, while empirical results show improvements in link prediction, node clustering, and classification across diverse networks, including LFR benchmarks, with strong robustness and scalability. Overall, TLWalk provides a parameter-free, scalable framework for capturing mesoscopic community structure in graph embeddings, with practical implications for social, biological, and ecological network analysis.

Abstract

Community structures are critical for understanding the mesoscopic organization of networks, bridging local and global patterns. While methods such as DeepWalk and node2vec capture local positional information through random walks, they fail to preserve community structures. Other approaches like modularized nonnegative matrix factorization and evolutionary algorithms address this gap but are computationally expensive and unsuitable for large-scale networks. To overcome these limitations, we propose Two Layer Walk (TLWalk), a novel graph embedding algorithm that incorporates hierarchical community structures. TLWalk balances intra- and inter-community relationships through a community-aware random walk mechanism without requiring additional parameters. Theoretical analysis demonstrates that TLWalk effectively mitigates locality bias. Experiments on benchmark datasets show that TLWalk outperforms state-of-the-art methods, achieving up to 3.2% accuracy gains for link prediction tasks. By encoding dense local and sparse global structures, TLWalk proves robust and scalable across diverse networks, offering an efficient solution for network analysis.

Paper Structure

This paper contains 9 sections, 29 equations, 3 figures, 5 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the hierarchical organization in the Zachary’s Karate Club network. The intra-community level (colored groups) captures dense connections within communities, while the inter-community level (pink edges) highlights sparser connections linking different communities. This structure forms the basis for TLWalk's two-layer embedding approach.
  • Figure 2: AUC Gain/Loss of TLW over Baseline Methods. TLW achieves significant gains over baseline methods, particularly on fb-pages-food and soc-hamsterster, while maintaining consistent performance across diverse datasets.
  • Figure 3: Performance comparison of TLWalk and baseline methods on LFR benchmark networks as a function of the mixing parameter $\mu$. (A) $\tau_1 = 2.1$ (highly heterogeneous degree distribution), (B) $\tau_1 = 3.0$ (less heterogeneous degree distribution). TLWalk consistently achieves superior performance across all levels of $\mu$.