Table of Contents
Fetching ...

GraphC: Parameter-free Hierarchical Clustering of Signed Graph Networks v2

Muhieddine Shebaro, Lucas Rusnak, Martin Burtscher, Jelena Tešić

TL;DR

GraphC introduces a parameter-free, hierarchical clustering approach for signed networks that avoids spectral decompositions by exploiting Harary cuts on balanced states. It formalizes a duality between balance theory and spectral properties, defines a joint loss that balances positive-internal and negative-between edge structures, and uses a Gamma-controlled pruning strategy to scale to large graphs. Across Konect and Amazon datasets, GraphC demonstrates strong improvements in pos_in and neg_out over baselines, with robust performance even when traditional methods fail to converge. The work offers a scalable, k-independent alternative for signed-graph community detection with practical impact for large-scale social, biological, and recommendation networks.

Abstract

Spectral clustering methodologies, when extended to accommodate signed graphs, have encountered notable limitations in effectively encapsulating inherent grouping relationships. Recent findings underscore a substantial deterioration in the efficacy of spectral clustering methods when applied to expansive signed networks. We introduce a scalable hierarchical Graph Clustering algorithm denominated GraphC. This algorithm excels at discerning optimal clusters within signed networks of varying magnitudes. GraphC aims to preserve the positive edge fractions within communities during partitioning while concurrently maximizing the negative edge fractions between communities. Importantly, GraphC does not require a predetermined cluster count (denoted as k). Empirical substantiation of GraphC 's efficacy is provided through a comprehensive evaluation involving fourteen datasets juxtaposed against ten baseline signed graph clustering algorithms. The algorithm's scalability is demonstrated through its application to extensive signed graphs drawn from Amazon-sourced datasets, each comprising tens of millions of vertices and edges. A noteworthy accomplishment is evidenced, with an average cumulative enhancement of 18.64% (consisting of the summation of positive edge fractions within communities and negative edge fractions between communities) over the second-best baseline for each respective signed graph. It is imperative to note that this evaluation excludes instances wherein all baseline algorithms failed to execute comprehensively.

GraphC: Parameter-free Hierarchical Clustering of Signed Graph Networks v2

TL;DR

GraphC introduces a parameter-free, hierarchical clustering approach for signed networks that avoids spectral decompositions by exploiting Harary cuts on balanced states. It formalizes a duality between balance theory and spectral properties, defines a joint loss that balances positive-internal and negative-between edge structures, and uses a Gamma-controlled pruning strategy to scale to large graphs. Across Konect and Amazon datasets, GraphC demonstrates strong improvements in pos_in and neg_out over baselines, with robust performance even when traditional methods fail to converge. The work offers a scalable, k-independent alternative for signed-graph community detection with practical impact for large-scale social, biological, and recommendation networks.

Abstract

Spectral clustering methodologies, when extended to accommodate signed graphs, have encountered notable limitations in effectively encapsulating inherent grouping relationships. Recent findings underscore a substantial deterioration in the efficacy of spectral clustering methods when applied to expansive signed networks. We introduce a scalable hierarchical Graph Clustering algorithm denominated GraphC. This algorithm excels at discerning optimal clusters within signed networks of varying magnitudes. GraphC aims to preserve the positive edge fractions within communities during partitioning while concurrently maximizing the negative edge fractions between communities. Importantly, GraphC does not require a predetermined cluster count (denoted as k). Empirical substantiation of GraphC 's efficacy is provided through a comprehensive evaluation involving fourteen datasets juxtaposed against ten baseline signed graph clustering algorithms. The algorithm's scalability is demonstrated through its application to extensive signed graphs drawn from Amazon-sourced datasets, each comprising tens of millions of vertices and edges. A noteworthy accomplishment is evidenced, with an average cumulative enhancement of 18.64% (consisting of the summation of positive edge fractions within communities and negative edge fractions between communities) over the second-best baseline for each respective signed graph. It is imperative to note that this evaluation excludes instances wherein all baseline algorithms failed to execute comprehensively.

Paper Structure

This paper contains 17 sections, 4 theorems, 8 equations, 9 figures, 5 tables, 3 algorithms.

Key Result

Corollary 3.1

For a given connected graph $G$, we create the fundamental cycle basis when we select all cycles formed by combining a path in the tree and a single edge outside the tree. For a graph $G$ with $n$ vertices and $m$ edges, there are exactly $m-n+1$ fundamental cycles because the spanning tree includes

Figures (9)

  • Figure 1: Dotted lines are negative edges whereas sold lines are positive edges. Top: A graph $G$, a balanced graph $\Sigma_1$ obtained by switching (changing the sign of the edges originating in) $v_1$, a balanced graph $\Sigma_2$ obtained by switching $v_1$ and $v_2$, a balanced graph $\Sigma_3$ by switching $v_2$ and $v_3$, and a balanced graph $\Sigma_4$ by switching $v_4$. $\Sigma_5$, $\Sigma_6$, $\Sigma_7$ and $\Sigma_8$ are examples of unbalanced states. Bottom: Harary Cuts of nearest stable states 2021Cloud of $\Sigma_5$.
  • Figure 2: An illustration emphasizing the advantage of measuring the quality of a clustering assignment in a signed graph using the summation of the fraction of positive edges within communities and the fraction of negative edges between communities. Dotted lines are negative edges whereas the sold lines are positive edges.
  • Figure 3: The graphC pipeline: the starting state is "Connected component in set processed?" block.
  • Figure 4: Execution of the graphC algorithm on the Highland signed graph comprising 16 vertices and 58 edges.
  • Figure 5: Example output of the overall improvements with each committed Harary split of every connected component of our proposed algorithm on the signed PPI graph.
  • ...and 4 more figures

Theorems & Definitions (10)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Corollary 3.1
  • Definition 3.4
  • Theorem 3.1: Har2
  • Theorem 3.2
  • Proof 3.1
  • Theorem 3.3
  • Proof 3.2