Table of Contents
Fetching ...

PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph Clustering

Longlong Lin, Tao Jia, Zeli Wang, Jin Zhao, Rong-Hua Li

TL;DR

The paper tackles motif-based higher-order graph clustering by addressing the limitations of the canonical two-stage reweighting framework, which either lacks provable guarantees for motifs larger than triangles or is prohibitively expensive due to motif enumeration. It introduces PSMC, a Provable and Scalable Motif Conductance algorithm that defines a locally computable Motif Resident score and uses an iterative peeling process to produce clusters with a fixed, motif-independent approximation bound: φ_M(Ŝ) ≤ 1/2 + 1/2 φ_M^*. The method further accelerates computation with dynamic updates and tight lower/upper bounds on motif counts via Turán-type results and colorful h-star/wedge concepts, achieving near-linear time behavior in practice. Empirical results on real and synthetic graphs show substantial speedups (3.2×–32×) and notable quality improvements (at least 12×) over strong baselines, with reduced memory footprints, demonstrating PSMC’s practicality for massive networks.

Abstract

Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instances to obtain an edge-weighted graph for partitioning. However, such a framework has two-fold vital defects: (1) It can only provide a quadratic bound for the motif with three vertices, and whether there is provable clustering quality for other motifs is still an open question. (2) The enumeration procedure of motif instances incurs prohibitively high costs against large motifs or large dense graphs due to combinatorial explosions. Besides, expensive spectral clustering or local graph diffusion on the edge-weighted graph also makes existing methods unable to handle massive graphs with millions of nodes. To overcome these dilemmas, we propose a Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and motif-independent approximation ratio for any motif. Specifically, PSMC first defines a new vertex metric Motif Resident based on the given motif, which can be computed locally. Then, it iteratively deletes the vertex with the smallest motif resident value very efficiently using novel dynamic update technologies. Finally, it outputs the locally optimal result during the above iterative process. To further boost efficiency, we propose several effective bounds to estimate the motif resident value of each vertex, which can greatly reduce computational costs. Empirical results show that our proposed algorithms achieve 3.2-32 times speedup and improve the quality by at least 12 times than the baselines.

PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph Clustering

TL;DR

The paper tackles motif-based higher-order graph clustering by addressing the limitations of the canonical two-stage reweighting framework, which either lacks provable guarantees for motifs larger than triangles or is prohibitively expensive due to motif enumeration. It introduces PSMC, a Provable and Scalable Motif Conductance algorithm that defines a locally computable Motif Resident score and uses an iterative peeling process to produce clusters with a fixed, motif-independent approximation bound: φ_M(Ŝ) ≤ 1/2 + 1/2 φ_M^*. The method further accelerates computation with dynamic updates and tight lower/upper bounds on motif counts via Turán-type results and colorful h-star/wedge concepts, achieving near-linear time behavior in practice. Empirical results on real and synthetic graphs show substantial speedups (3.2×–32×) and notable quality improvements (at least 12×) over strong baselines, with reduced memory footprints, demonstrating PSMC’s practicality for massive networks.

Abstract

Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instances to obtain an edge-weighted graph for partitioning. However, such a framework has two-fold vital defects: (1) It can only provide a quadratic bound for the motif with three vertices, and whether there is provable clustering quality for other motifs is still an open question. (2) The enumeration procedure of motif instances incurs prohibitively high costs against large motifs or large dense graphs due to combinatorial explosions. Besides, expensive spectral clustering or local graph diffusion on the edge-weighted graph also makes existing methods unable to handle massive graphs with millions of nodes. To overcome these dilemmas, we propose a Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and motif-independent approximation ratio for any motif. Specifically, PSMC first defines a new vertex metric Motif Resident based on the given motif, which can be computed locally. Then, it iteratively deletes the vertex with the smallest motif resident value very efficiently using novel dynamic update technologies. Finally, it outputs the locally optimal result during the above iterative process. To further boost efficiency, we propose several effective bounds to estimate the motif resident value of each vertex, which can greatly reduce computational costs. Empirical results show that our proposed algorithms achieve 3.2-32 times speedup and improve the quality by at least 12 times than the baselines.
Paper Structure (19 sections, 11 theorems, 5 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 11 theorems, 5 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Given a graph $G(V,E)$ and a motif $\mathbb{M}$, for any $S \subseteq V$, we have Where $\phi^{\mathcal{G}^\mathbb{M}}(S)$ is the edge-based conductance of $S$ in terms of the weighted graph $\mathcal{G}^\mathbb{M}$ and $I(.)$ is the indicator function. Note that when $k(\mathbb{M})>4$, the relationship between $\phi_{\mathbb{M}}(S)$ and $\phi^{\mathcal{G}^\mathbb{M}}(S)$ is uncl

Figures (7)

  • Figure 1: Illustration of the traditional edge-based conductance and the motif conductance on a synthetic graph. There are 47 edges and 60 triangles. The blue dotted line indicates the optimal cut when the motif is an edge and the corresponding conductance is $\frac{4}{\min\{42,52\}}$. The green dotted line represents the optimal cut when the motif is a triangle and the corresponding triangle conductance is $\frac{2}{\min\{116,64\}}$. Motif conductance is more likely to preserve motif instances compared with edge-based conductance.
  • Figure 2: Colorful $h$-stars and colorful ($h-2$)-wedges.
  • Figure 3: Runtime (seconds) of different motif conductance algorithms with varying $k(\mathbb{M})$.
  • Figure 4: Scalability testing on synthetic graphs.
  • Figure 5: Memory overhead on real-world graphs (excluding the size of the graph itself).
  • ...and 2 more figures

Theorems & Definitions (15)

  • Definition 1: higher-order graph clustering
  • Definition 2: motif conductance
  • Theorem 1: benson2016higher
  • Theorem 2: Cheeger inequality benson2016higher
  • Definition 3: Motif Degree
  • Definition 4: motif resident
  • Lemma 1: Monotonicity
  • Lemma 2
  • Lemma 3: Reformulation of Motif Conductance
  • Lemma 4
  • ...and 5 more