A Note on Computing Betweenness Centrality from the 2-core

Charalampos E. Tsourakakis

A Note on Computing Betweenness Centrality from the 2-core

Charalampos E. Tsourakakis

TL;DR

This paper addresses the computational bottleneck of betweenness centrality (BC) by deriving a recursive BC computation from the graph's 2-core via progressive peeling of degree-1 nodes. It introduces a one-round peeling algorithm that augments Brandes' framework with delta and zeta quantities to capture the contributions of removed leaves, and analyzes memory-efficient and sampling-based variants to accelerate exact and approximate BC computations. Theoretical results establish a recurrence linking BC in the original graph to BC in the 2-core, with a quantified time complexity that yields asymptotic speedups in graphs with large leaves or thin cores; empirical results on synthetic and real networks confirm substantial speedups and improved accuracy, particularly for small pivot sets. These contributions have practical impact for BC computation in large knowledge graphs and networks where degree-1 structure is prevalent, enabling faster analysis with limited memory and improved sampling efficiency.

Abstract

A central task in network analysis is to identify important nodes in a graph. Betweenness centrality (BC) is a popular centrality measure that captures the significance of nodes based on the number of shortest paths each node intersects with. In this note, we derive a recursive formula to compute the betweenness centralities of a graph from the betweenness centralities of its 2-core.Furthermore, we analyze mathematically the significant impact of removing degree-one nodes on the estimation of betweenness centrality within the context of the popular pivot sampling scheme for Single-Source Shortest Path (SSSP) computations, as described in the Brandes-Pich approach and implemented in widely used software such as NetworkX. We demonstrate both theoretically and empirically that removing degree-1 nodes can reduce the sample complexity needed to achieve better accuracy, thereby decreasing the overall runtime.

A Note on Computing Betweenness Centrality from the 2-core

TL;DR

Abstract

Paper Structure (21 sections, 11 theorems, 33 equations, 5 figures, 2 tables, 3 algorithms)

This paper contains 21 sections, 11 theorems, 33 equations, 5 figures, 2 tables, 3 algorithms.

Introduction
Related Work
Betweenness centrality in practice
Brandes' algorithm
Speeding up BC computation
Theoretical preliminaries: probabilistic inequalities
Exact Betweenness Centrality Computation
Recursive BC computation from the 2-core
BC from 1-round of peeling
Time complexity.
Memory efficient implementation
Sampling after Peeling
Experiments
Experimental setup
Synthetic Experiment
...and 6 more sections

Key Result

Proposition 2.2

Let $X_1, \ldots, X_k$ be independent identically distributed (iid) random variables with $0 \leq X_i \leq M$$(i = 1, \ldots, k)$ and an arbitrary $\xi \geq 0$,

Figures (5)

Figure 1: (a) Histogram of the number of peeling rounds. (b) Fraction of degree-1 nodes $\frac{V_1^{(i)}}{n}$ removed in each round $i$ for each dataset.
Figure 2: (a) Partial view of the synthetic graph with core and degree-1 nodes. (b) Relative $\ell_1$ error of Brandes-Pich and our method vs the number of sampled pivots $k \in \{10,20,\ldots,100\}$. (c) BC estimates using a sample of 10 pivots with and without peeling vs groundtruth betweenness centralities.
Figure 3: (a) For a fixed number of random pivots $k = 10$, the relative error of our method approaches 0, in contrast to the prediction by Brandes-Pich brandes2007centrality, as stated in Theorem \ref{['thm:sample']}. (b) Speedups over the exact BC algorithm due to Brandes brandes2001faster are achieved by both our exact memory-efficient implementation and our approximate version with a sample size of $k = 10$ pivots.
Figure 4: (a) BC estimates using a sample of size 100 with and without peeling vs groundtruth betweenness centralities for the Arxiv GR-QC dataset. (b) Average relative $\ell_1$ error of Brandes-Pich and our method over 5 runs vs the number of sampled pivots for the Arxiv GR-QC, and (c) the email-Eu-core datasets respectively.
Figure 5: Average relative $\ell_1$ error of Brandes-Pich and our method vs the number of sampled pivots $k \in \{5,10,20,\ldots,100\}$ for four real-world networks from Table \ref{['tab:data']} over five runs.

Theorems & Definitions (19)

Definition 1.1: Betweenness Centrality freeman1977set
Definition 2.1
Proposition 2.2: Hoeffding's inequality hoeffding1994probability
Proposition 2.3: Bernstein's inequality for bounded distributions bernstein1924modification
Theorem 3.1
proof
Corollary 3.2
Theorem 3.3
proof
Theorem 3.4
...and 9 more

A Note on Computing Betweenness Centrality from the 2-core

TL;DR

Abstract

A Note on Computing Betweenness Centrality from the 2-core

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (19)