Parallel $k$-Core Decomposition: Theory and Practice

Youzhe Liu; Xiaojun Dong; Yan Gu; Yihan Sun

Parallel $k$-Core Decomposition: Theory and Practice

Youzhe Liu, Xiaojun Dong, Yan Gu, Yihan Sun

TL;DR

The paper addresses the challenge of fast, work-efficient parallel $k$-core decomposition on large graphs. It proposes a simple frontier-based framework that achieves $O(n+m)$ work and enhances parallelism through two key techniques: a sampling scheme to reduce contention on high-degree vertices and Vertical Granularity Control (VGC) to hide scheduling overhead, complemented by a Hierarchical Bucketing Structure (HBS) to optimize frontier management. The combined approach yields state-of-the-art performance, with speedups up to $315\times$ over ParK, $33.4\times$ over PKC, and $52.5\times$ over Julienne on 25 graphs, and strong scalability on a 96-core machine across dense and sparse graphs. The work demonstrates that work-efficiency and high parallelism can be achieved together in practical implementations, providing reusable techniques for parallel graph peeling and related problems. These advances enable faster exact $k$-core decompositions in real-world analytics and graph mining tasks.

Abstract

This paper proposes efficient solutions for $k$-core decomposition with high parallelism. The problem of $k$-core decomposition is fundamental in graph analysis and has applications across various domains. However, existing algorithms face significant challenges in achieving work-efficiency in theory and/or high parallelism in practice, and suffer from various performance bottlenecks. We present a simple, work-efficient parallel framework for $k$-core decomposition that is easy to implement and adaptable to various strategies for improving work-efficiency. We introduce two techniques to enhance parallelism: a sampling scheme to reduce contention on high-degree vertices, and vertical granularity control (VGC) to mitigate scheduling overhead for low-degree vertices. Furthermore, we design a hierarchical bucket structure to optimize performance for graphs with high coreness values. We evaluate our algorithm on a diverse set of real-world and synthetic graphs. Compared to state-of-the-art parallel algorithms, including ParK, PKC, and Julienne, our approach demonstrates superior performance on 23 out of 25 graphs when tested on a 96-core machine. Our algorithm shows speedups of up to 315$\times$ over ParK, 33.4$\times$ over PKC, and 52.5$\times$ over Julienne.

Parallel $k$-Core Decomposition: Theory and Practice

TL;DR

The paper addresses the challenge of fast, work-efficient parallel

-core decomposition on large graphs. It proposes a simple frontier-based framework that achieves

work and enhances parallelism through two key techniques: a sampling scheme to reduce contention on high-degree vertices and Vertical Granularity Control (VGC) to hide scheduling overhead, complemented by a Hierarchical Bucketing Structure (HBS) to optimize frontier management. The combined approach yields state-of-the-art performance, with speedups up to

over ParK,

over PKC, and

over Julienne on 25 graphs, and strong scalability on a 96-core machine across dense and sparse graphs. The work demonstrates that work-efficiency and high parallelism can be achieved together in practical implementations, providing reusable techniques for parallel graph peeling and related problems. These advances enable faster exact

-core decompositions in real-world analytics and graph mining tasks.

Abstract

This paper proposes efficient solutions for

-core decomposition with high parallelism. The problem of

-core decomposition is fundamental in graph analysis and has applications across various domains. However, existing algorithms face significant challenges in achieving work-efficiency in theory and/or high parallelism in practice, and suffer from various performance bottlenecks. We present a simple, work-efficient parallel framework for

-core decomposition that is easy to implement and adaptable to various strategies for improving work-efficiency. We introduce two techniques to enhance parallelism: a sampling scheme to reduce contention on high-degree vertices, and vertical granularity control (VGC) to mitigate scheduling overhead for low-degree vertices. Furthermore, we design a hierarchical bucket structure to optimize performance for graphs with high coreness values. We evaluate our algorithm on a diverse set of real-world and synthetic graphs. Compared to state-of-the-art parallel algorithms, including ParK, PKC, and Julienne, our approach demonstrates superior performance on 23 out of 25 graphs when tested on a 96-core machine. Our algorithm shows speedups of up to 315

over ParK, 33.4

over PKC, and 52.5

over Julienne.

Parallel $k$-Core Decomposition: Theory and Practice

TL;DR

Abstract

Parallel $k$-Core Decomposition: Theory and Practice

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (4)