Table of Contents
Fetching ...

Approximating Spanning Centrality with Random Bouquets

Gökhan Göktürk, Kamer Kaya

TL;DR

This paper tackles the high computational cost of All Edges Spanning Centrality (AESC) by introducing Bouquets, a hash-based, sampling-aware approach that clusters random walks into vectorizable groups and arranges them to maximize data locality. The method combines a hash-based RNG, SIMD-enabled RandomBouquet generation, and SABA to achieve large speedups over the state-of-the-art TGT+—up to ~100× when using multiple cores—while preserving approximation quality. Key contributions include a detailed algorithmic design for bouquet-based random walks, extensive randomness and cache-growth evaluations, and practical implementation choices that yield scalable AESC performance on real graphs and synthetic benchmarks. The work demonstrates that high-throughput, accurate AESC approximation is attainable on commodity CPUs, unlocking broader applicability in network analysis tasks such as resilience and connectivity studies.

Abstract

Spanning Centrality is a measure used in network analysis to determine the importance of an edge in a graph based on its contribution to the connectivity of the entire network. Specifically, it quantifies how critical an edge is in terms of the number of spanning trees that include that edge. The current state-of-the-art for All Edges Spanning Centrality~(AESC), which computes the exact centrality values for all the edges, has a time complexity of $\mathcal{O}(mn^{3/2})$ for $n$ vertices and $m$ edges. This makes the computation infeasible even for moderately sized graphs. Instead, there exist approximation algorithms which process a large number of random walks to estimate edge centralities. However, even the approximation algorithms can be computationally overwhelming, especially if the approximation error bound is small. In this work, we propose a novel, hash-based sampling method and a vectorized algorithm which greatly improves the execution time by clustering random walks into {\it Bouquets}. On synthetic random walk benchmarks, {\it Bouquets} performs $7.8\times$ faster compared to naive, traditional random-walk generation. We also show that the proposed technique is scalable by employing it within a state-of-the-art AESC approximation algorithm, {\sc TGT+}. The experiments show that using Bouquets yields more than $100\times$ speed-up via parallelization with 16 threads.

Approximating Spanning Centrality with Random Bouquets

TL;DR

This paper tackles the high computational cost of All Edges Spanning Centrality (AESC) by introducing Bouquets, a hash-based, sampling-aware approach that clusters random walks into vectorizable groups and arranges them to maximize data locality. The method combines a hash-based RNG, SIMD-enabled RandomBouquet generation, and SABA to achieve large speedups over the state-of-the-art TGT+—up to ~100× when using multiple cores—while preserving approximation quality. Key contributions include a detailed algorithmic design for bouquet-based random walks, extensive randomness and cache-growth evaluations, and practical implementation choices that yield scalable AESC performance on real graphs and synthetic benchmarks. The work demonstrates that high-throughput, accurate AESC approximation is attainable on commodity CPUs, unlocking broader applicability in network analysis tasks such as resilience and connectivity studies.

Abstract

Spanning Centrality is a measure used in network analysis to determine the importance of an edge in a graph based on its contribution to the connectivity of the entire network. Specifically, it quantifies how critical an edge is in terms of the number of spanning trees that include that edge. The current state-of-the-art for All Edges Spanning Centrality~(AESC), which computes the exact centrality values for all the edges, has a time complexity of for vertices and edges. This makes the computation infeasible even for moderately sized graphs. Instead, there exist approximation algorithms which process a large number of random walks to estimate edge centralities. However, even the approximation algorithms can be computationally overwhelming, especially if the approximation error bound is small. In this work, we propose a novel, hash-based sampling method and a vectorized algorithm which greatly improves the execution time by clustering random walks into {\it Bouquets}. On synthetic random walk benchmarks, {\it Bouquets} performs faster compared to naive, traditional random-walk generation. We also show that the proposed technique is scalable by employing it within a state-of-the-art AESC approximation algorithm, {\sc TGT+}. The experiments show that using Bouquets yields more than speed-up via parallelization with 16 threads.

Paper Structure

This paper contains 20 sections, 11 equations, 5 figures, 7 tables, 3 algorithms.

Figures (5)

  • Figure 1: A sample with 4 vertices and 5 undirected edges. Vertices $A$ and $D$ are degree-2 vertices, and $B$ and $C$ are degree-3 vertices. All the edges in this graph have an SC value of $\frac{1}{2}$.
  • Figure 2: A set of random walks generated from the sample graph in Fig. \ref{['fig:toy']} with no judicious arrangement (top) and their vectorized execution (bottom).
  • Figure 3: A set of random walks generated from the sample graph in Figure \ref{['fig:toy']} with SABA (top) and their vectorized execution (bottom).
  • Figure 4: Speed-up achieved against TGT+ implementation via a single thread and $\epsilon=0.005$.
  • Figure 5: Speed-up achieved against naive implementation using 16-threads. Walk length=15, number of walks = 16384.