Table of Contents
Fetching ...

A multi-core periphery perspective: Ranking via relative centrality

Chandra Sekhar Mukherjee, Jiapeng Zhang

TL;DR

The paper formalizes a multi-core periphery with communities (MCPC) structure in graphs and introduces relative centrality to detect cores while mitigating biases of traditional centrality measures. It presents a meta-algorithm framework MR-Rank with concrete instantiations ($N$-Rank, $RN$-Rank, $N2$-Rank) and validates them on both synthetic MCPC-block models and 11 single-cell RNA-seq datasets, showing improved core representation, better ICEF, and enhanced downstream clustering. It also demonstrates MCPC-like structure in $k$-NN embeddings of vector data (including concentric GMMs) and shows that core-ranked subsets yield more balanced, multi-community-preserving selections with favorable preservation ratios. The work offers a scalable approach to jointly leverage community and core-periphery structures in unsupervised graph learning, with potential extensions to weighted embeddings and non-linear geometries.

Abstract

Community and core-periphery are two widely studied graph structures, with their coexistence observed in real-world graphs (Rombach, Porter, Fowler \& Mucha [SIAM J. App. Math. 2014, SIAM Review 2017]). However, the nature of this coexistence is not well understood and has been pointed out as an open problem (Yanchenko \& Sengupta [Statistics Surveys, 2023]). Especially, the impact of inferring the core-periphery structure of a graph on understanding its community structure is not well utilized. In this direction, we introduce a novel quantification for graphs with ground truth communities, where each community has a densely connected part (the core), and the rest is more sparse (the periphery), with inter-community edges more frequent between the peripheries. Built on this structure, we propose a new algorithmic concept that we call relative centrality to detect the cores. We observe that core-detection algorithms based on popular centrality measures such as PageRank and degree centrality can show some bias in their outcome by selecting very few vertices from some cores. We show that relative centrality solves this bias issue and provide theoretical and simulation support, as well as experiments on real-world graphs. Core detection is known to have important applications with respect to core-periphery structures. In our model, we show a new application: relative-centrality-based algorithms can select a subset of the vertices such that it contains sufficient vertices from all communities, and points in this subset are better separable into their respective communities. We apply the methods to 11 biological datasets, with our methods resulting in a more balanced selection of vertices from all communities such that clustering algorithms have better performance on this set.

A multi-core periphery perspective: Ranking via relative centrality

TL;DR

The paper formalizes a multi-core periphery with communities (MCPC) structure in graphs and introduces relative centrality to detect cores while mitigating biases of traditional centrality measures. It presents a meta-algorithm framework MR-Rank with concrete instantiations (-Rank, -Rank, -Rank) and validates them on both synthetic MCPC-block models and 11 single-cell RNA-seq datasets, showing improved core representation, better ICEF, and enhanced downstream clustering. It also demonstrates MCPC-like structure in -NN embeddings of vector data (including concentric GMMs) and shows that core-ranked subsets yield more balanced, multi-community-preserving selections with favorable preservation ratios. The work offers a scalable approach to jointly leverage community and core-periphery structures in unsupervised graph learning, with potential extensions to weighted embeddings and non-linear geometries.

Abstract

Community and core-periphery are two widely studied graph structures, with their coexistence observed in real-world graphs (Rombach, Porter, Fowler \& Mucha [SIAM J. App. Math. 2014, SIAM Review 2017]). However, the nature of this coexistence is not well understood and has been pointed out as an open problem (Yanchenko \& Sengupta [Statistics Surveys, 2023]). Especially, the impact of inferring the core-periphery structure of a graph on understanding its community structure is not well utilized. In this direction, we introduce a novel quantification for graphs with ground truth communities, where each community has a densely connected part (the core), and the rest is more sparse (the periphery), with inter-community edges more frequent between the peripheries. Built on this structure, we propose a new algorithmic concept that we call relative centrality to detect the cores. We observe that core-detection algorithms based on popular centrality measures such as PageRank and degree centrality can show some bias in their outcome by selecting very few vertices from some cores. We show that relative centrality solves this bias issue and provide theoretical and simulation support, as well as experiments on real-world graphs. Core detection is known to have important applications with respect to core-periphery structures. In our model, we show a new application: relative-centrality-based algorithms can select a subset of the vertices such that it contains sufficient vertices from all communities, and points in this subset are better separable into their respective communities. We apply the methods to 11 biological datasets, with our methods resulting in a more balanced selection of vertices from all communities such that clustering algorithms have better performance on this set.
Paper Structure (33 sections, 8 theorems, 19 equations, 19 figures, 5 tables, 2 algorithms)

This paper contains 33 sections, 8 theorems, 19 equations, 19 figures, 5 tables, 2 algorithms.

Key Result

Theorem 2.5

Let $G(V,E)$ be a graph sampled from the $\sf{MCPC}$-block model w.r.t partition of $V$ into $V_{\ell,c}, (\ell,c) \in \{0,1\}^2$ where $k=\omega(\log n)$. Let $F(v)$ be the degree of the vertex. Then for any $v_i \in V_{\ell,1}$ we have $F(v_i)= 2k + k\cdot (1\pm o(1)) (\sf{CC}_G(V_{\ell,1}))$.

Figures (19)

  • Figure 1: Different structures in $3$-regular directed graphs
  • Figure 2: Intra community edge fraction (ICEF) improvement and balancedness due to core-ranking
  • Figure 3: Parameterized block probabilities
  • Figure 4: Improvement in ICEF, and balanced of core-ranking in concentric GMM
  • Figure 5: Improvement in intra-community accuracy and balancedness by different ranking algorithms
  • ...and 14 more figures

Theorems & Definitions (20)

  • Definition 2.1: Core concentration
  • Definition 2.2
  • Definition 2.3: Performance metrics of CR algorithms
  • Definition 2.4: $\sf{MCPC}$-block model
  • Theorem 2.5: Behavior of degree centrality
  • Theorem 2.6: $1$-step N-Rank is good for the two-block model
  • Definition 3.1: Concentric GMM with two communities
  • Definition 3.2: Preservation ratio
  • Theorem A.1: Chernoff Hoeffding bound Chernoff
  • Lemma A.2: The graph is almost-regular w.r.t out-degree
  • ...and 10 more