Table of Contents
Fetching ...

Optimization-Free Graph Embedding via Distributional Kernel for Community Detection

Shuaibin Song, Kai Ming Ting, Kaifeng Zhang, Tianrun Liang

TL;DR

This work tackles over-smoothing in NAS-based graph embeddings for community detection by identifying two distributional factors—node-vector density and node-degree distribution—that traditional methods ignore. It introduces an optimization-free Weighted Distributional Kernel (WDK) and its multi-level extension (mWDK) that jointly model these distributions using Isolation Kernel to preserve node distinguishability and cluster structure. Empirical results show that mWDK yields equal-density, well-separated embeddings, outperforming a range of deep learning methods on 14 real and synthetic datasets, including a large-scale graph, while maintaining scalability. The findings argue for distribution-aware, optimization-free embeddings as a robust approach to unsupervised graph clustering and spectral clustering.

Abstract

Neighborhood Aggregation Strategy (NAS) is a widely used approach in graph embedding, underpinning both Graph Neural Networks (GNNs) and Weisfeiler-Lehman (WL) methods. However, NAS-based methods are identified to be prone to over-smoothing-the loss of node distinguishability with increased iterations-thereby limiting their effectiveness. This paper identifies two characteristics in a network, i.e., the distributions of nodes and node degrees that are critical for expressive representation but have been overlooked in existing methods. We show that these overlooked characteristics contribute significantly to over-smoothing of NAS-methods. To address this, we propose a novel weighted distribution-aware kernel that embeds nodes while taking their distributional characteristics into consideration. Our method has three distinguishing features: (1) it is the first method to explicitly incorporate both distributional characteristics; (2) it requires no optimization; and (3) it effectively mitigates the adverse effects of over-smoothing, allowing WL to preserve node distinguishability and expressiveness even after many iterations of embedding. Experiments demonstrate that our method achieves superior community detection performance via spectral clustering, outperforming existing graph embedding methods, including deep learning methods, on standard benchmarks.

Optimization-Free Graph Embedding via Distributional Kernel for Community Detection

TL;DR

This work tackles over-smoothing in NAS-based graph embeddings for community detection by identifying two distributional factors—node-vector density and node-degree distribution—that traditional methods ignore. It introduces an optimization-free Weighted Distributional Kernel (WDK) and its multi-level extension (mWDK) that jointly model these distributions using Isolation Kernel to preserve node distinguishability and cluster structure. Empirical results show that mWDK yields equal-density, well-separated embeddings, outperforming a range of deep learning methods on 14 real and synthetic datasets, including a large-scale graph, while maintaining scalability. The findings argue for distribution-aware, optimization-free embeddings as a robust approach to unsupervised graph clustering and spectral clustering.

Abstract

Neighborhood Aggregation Strategy (NAS) is a widely used approach in graph embedding, underpinning both Graph Neural Networks (GNNs) and Weisfeiler-Lehman (WL) methods. However, NAS-based methods are identified to be prone to over-smoothing-the loss of node distinguishability with increased iterations-thereby limiting their effectiveness. This paper identifies two characteristics in a network, i.e., the distributions of nodes and node degrees that are critical for expressive representation but have been overlooked in existing methods. We show that these overlooked characteristics contribute significantly to over-smoothing of NAS-methods. To address this, we propose a novel weighted distribution-aware kernel that embeds nodes while taking their distributional characteristics into consideration. Our method has three distinguishing features: (1) it is the first method to explicitly incorporate both distributional characteristics; (2) it requires no optimization; and (3) it effectively mitigates the adverse effects of over-smoothing, allowing WL to preserve node distinguishability and expressiveness even after many iterations of embedding. Experiments demonstrate that our method achieves superior community detection performance via spectral clustering, outperforming existing graph embedding methods, including deep learning methods, on standard benchmarks.
Paper Structure (30 sections, 2 theorems, 12 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 30 sections, 2 theorems, 12 equations, 10 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Let $\{\Lambda_i\}_{1}^{\psi}$ be a Voronoi tessellation of $\psi$ data points, where $\Lambda_i$ denotes the Voronoi cell centered at $x_i$, and $M_X(\Lambda_i)$ denotes an estimator of the probability mass of $\Lambda_i$ based on a data set $X \subset \mathbb{R}^d$. The expectation $E[M_X(\Lambda_

Figures (10)

  • Figure 1: Neighborhood aggregation in input space (top) vs. distribution-aware space derived from the distributional kernel mWDK (bottom). Standard NAS leads to over-smoothing, merging two subgraphs into indistinguishable representations. In contrast, mWDK preserves discriminability by incorporating their distributional differences.
  • Figure 2: Illustrations of WL, WDK and mWDK embedding methods: $v_{\mathrm{WL}}^{h}=\bigcup_{u \in \mathcal{V}_v} v^{h}$, $\ \mathcal{V}_{\mathrm{WDK}}^{h}=\bigcup_{u \in \mathcal{V}_v} \Phi\left(\tilde{\mathcal{P}}_{\mathbf{V}_{u}^{h}}\right)$ and $\mathcal{V}_{\mathrm{mWDK}}^{\hslash}=\bigcup_{u \in \mathcal{V}_v} \Phi\left(\tilde{\mathsf{P}}_{\mathbf{V}_{u}^{\hslash}}\right)$.
  • Figure 3: Similarity of WL, WDK, mWDK on the UE dataset having two communities $\mathcal{C}_1$ & $\mathcal{C}_2$ (described in Section \ref{['Sec-Datasets']}) with a uniform distribution of node degrees but an imbalanced distribution of nodes. The top-right table shows a comparison of smoothing rates of WL, WDK & mWDK. The smoothing rate at $h$-th iteration for $\mathcal{C}_1$ and $\mathcal{C}_2$ is computed as: $R_h(\mathcal{C}_1, \mathcal{C}_2) = \frac{S_h(\mathcal{C}_1,\mathcal{C}_2) - S_0(\mathcal{C}_1,\mathcal{C}_2)}{h}$, where $S_0(\mathcal{C}_1,\mathcal{C}_2)$ is the similarity in the input space.
  • Figure 4: t-SNE's 2D visualization of the embedded spaces of WL, WDK, mWDK at different iterations $h$ on the same UE dataset used in Figure \ref{['fig:sim']}.
  • Figure 5: t-SNE's 2D visualization of the embedded spaces of WL, WDK, mWDK at different iterations $h$ on the EU dataset having clusters with different node degrees (see Section \ref{['Sec-Datasets']}).
  • ...and 5 more figures

Theorems & Definitions (8)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Theorem 1
  • Theorem 2