Table of Contents
Fetching ...

Fast Maximization of Current Flow Group Closeness Centrality

Haisong Xia, Zhongzhi Zhang

TL;DR

This work tackles maximizing current flow closeness centrality for a node group $S$ of size $k$ under large-scale graphs, where $C(S)=\dfrac{n}{\mathrm{Tr}(\boldsymbol{L}_{-S}^{-1})}$. It introduces two greedy Monte Carlo algorithms, ForestCFCM and SchurCFCM, based on spanning-forest sampling and the Schur complement, and proves a $1-\dfrac{k}{k-1}\dfrac{1}{\mathrm{e}}-\epsilon$-approximation with nearly-linear time. ForestCFCM relies on unbiased forest-based estimators and adaptive sampling, while SchurCFCM leverages an auxiliary root set $T$ to obtain stronger diagonal dominance and faster sampling. Extensive experiments on real networks show substantial speedups (up to 370×) over the state-of-the-art, with SchurCFCM delivering the best overall efficiency and effectiveness, enabling CFCC maximization on graphs with millions of nodes. These methods thus enable scalable identification of crucial node groups in large-scale network analysis.

Abstract

Derived from effective resistances, the current flow closeness centrality (CFCC) for a group of nodes measures the importance of node groups in an undirected graph with $n$ nodes. Given the widespread applications of identifying crucial nodes, we investigate the problem of maximizing CFCC for a node group $S$ subject to the cardinality constraint $|S|=k\ll n$. Despite the proven NP-hardness of this problem, we propose two novel greedy algorithms for its solution. Our algorithms are based on spanning forest sampling and Schur complement, which exhibit nearly linear time complexities and achieve an approximation factor of $1-\frac{k}{k-1}\frac{1}{\mathrm{e}}-ε$ for any $0<ε<1$. Extensive experiments on real-world graphs illustrate that our algorithms outperform the state-of-the-art method in terms of efficiency and effectiveness, scaling to graphs with millions of nodes.

Fast Maximization of Current Flow Group Closeness Centrality

TL;DR

This work tackles maximizing current flow closeness centrality for a node group of size under large-scale graphs, where . It introduces two greedy Monte Carlo algorithms, ForestCFCM and SchurCFCM, based on spanning-forest sampling and the Schur complement, and proves a -approximation with nearly-linear time. ForestCFCM relies on unbiased forest-based estimators and adaptive sampling, while SchurCFCM leverages an auxiliary root set to obtain stronger diagonal dominance and faster sampling. Extensive experiments on real networks show substantial speedups (up to 370×) over the state-of-the-art, with SchurCFCM delivering the best overall efficiency and effectiveness, enabling CFCC maximization on graphs with millions of nodes. These methods thus enable scalable identification of crucial node groups in large-scale network analysis.

Abstract

Derived from effective resistances, the current flow closeness centrality (CFCC) for a group of nodes measures the importance of node groups in an undirected graph with nodes. Given the widespread applications of identifying crucial nodes, we investigate the problem of maximizing CFCC for a node group subject to the cardinality constraint . Despite the proven NP-hardness of this problem, we propose two novel greedy algorithms for its solution. Our algorithms are based on spanning forest sampling and Schur complement, which exhibit nearly linear time complexities and achieve an approximation factor of for any . Extensive experiments on real-world graphs illustrate that our algorithms outperform the state-of-the-art method in terms of efficiency and effectiveness, scaling to graphs with millions of nodes.

Paper Structure

This paper contains 32 sections, 17 theorems, 32 equations, 5 figures, 2 tables, 5 algorithms.

Key Result

Lemma 3.1

In the corresponding electrical network of a graph $G$, suppose a unit current flows from $u$ to $v$. The current through $\left( a,b \right)$ is then given by $\frac{1}{N}( {N}_{u,v}^{{a}\to{b}}-{N}_{u,v}^{{b}\to{a}} )$.

Figures (5)

  • Figure 1: CFCC $C(S)$ of node set $S$ computed by different algorithms on four tiny-scale graphs: Zebra (a), Karate (b), Cont. USA (c) and Dolphins (d).
  • Figure 2: CFCC $C(S)$ of node set $S$ computed by different algorithms on small-scale graphs: Hamsterster (a), web-EPA (b), Routeviews (c), soc-PagesGov (d), Astro-Ph (e) and EmailEnron (f).
  • Figure 3: CFCC $C(S)$ of node set $S$ by different algorithms on large-scale graphs: Livemocha (a), WordNet (b), Gowalla (c) and com-DBLP (d).
  • Figure 4: Running time of different algorithms with varying error parameter $\epsilon$ on real-world graphs: Euroroads (a), soc-PagesGov (b), EmailEnron (c), com-DBLP (d), Skitter (e) and sc-rel9 (f).
  • Figure 5: Relative difference of different algorithms with varying error parameter $\epsilon$ on small-scale graphs: Facebook (a), GR-QC (b), web-EPA (c), Routeviews (d), HEP-Th (e) and CAIDA (f).

Theorems & Definitions (20)

  • Definition 2.1: $\epsilon$-approximation
  • Definition 2.2: Current Flow Closeness Maximization, CFCM
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4: JL Lemma JoLi84
  • Lemma 3.5: BoRaZh11
  • Lemma 3.6
  • Lemma 3.7
  • Lemma 3.8: Hoeffding's inequality
  • ...and 10 more