Parallel $k$-Core Decomposition: Theory and Practice
Youzhe Liu, Xiaojun Dong, Yan Gu, Yihan Sun
TL;DR
The paper addresses the challenge of fast, work-efficient parallel $k$-core decomposition on large graphs. It proposes a simple frontier-based framework that achieves $O(n+m)$ work and enhances parallelism through two key techniques: a sampling scheme to reduce contention on high-degree vertices and Vertical Granularity Control (VGC) to hide scheduling overhead, complemented by a Hierarchical Bucketing Structure (HBS) to optimize frontier management. The combined approach yields state-of-the-art performance, with speedups up to $315\times$ over ParK, $33.4\times$ over PKC, and $52.5\times$ over Julienne on 25 graphs, and strong scalability on a 96-core machine across dense and sparse graphs. The work demonstrates that work-efficiency and high parallelism can be achieved together in practical implementations, providing reusable techniques for parallel graph peeling and related problems. These advances enable faster exact $k$-core decompositions in real-world analytics and graph mining tasks.
Abstract
This paper proposes efficient solutions for $k$-core decomposition with high parallelism. The problem of $k$-core decomposition is fundamental in graph analysis and has applications across various domains. However, existing algorithms face significant challenges in achieving work-efficiency in theory and/or high parallelism in practice, and suffer from various performance bottlenecks. We present a simple, work-efficient parallel framework for $k$-core decomposition that is easy to implement and adaptable to various strategies for improving work-efficiency. We introduce two techniques to enhance parallelism: a sampling scheme to reduce contention on high-degree vertices, and vertical granularity control (VGC) to mitigate scheduling overhead for low-degree vertices. Furthermore, we design a hierarchical bucket structure to optimize performance for graphs with high coreness values. We evaluate our algorithm on a diverse set of real-world and synthetic graphs. Compared to state-of-the-art parallel algorithms, including ParK, PKC, and Julienne, our approach demonstrates superior performance on 23 out of 25 graphs when tested on a 96-core machine. Our algorithm shows speedups of up to 315$\times$ over ParK, 33.4$\times$ over PKC, and 52.5$\times$ over Julienne.
