Table of Contents
Fetching ...

PICO: Accelerating All k-Core Paradigms on GPU

Chen Zhao, Ting Yu, Zhigao Zheng, Song Jin, Jiawei Jiang, Bo Du, Dacheng Tao

TL;DR

This work targets the computational bottlenecks of k-core decomposition on GPUs by presenting PICO, a unified framework that optimizes both Peel and Index2core paradigms. It introduces PeelOne, which uses the under-core concept and an assertion-based atomic operation to reduce synchronization and atomic overhead, and HistoCore, which employs a cnt-based frontier selection with a histogram-maintenance strategy to minimize redundant edge accesses. Across 24 diverse graphs on an RTX $3090$, PeelOne achieves strong performance gains over state-of-the-art Peel implementations, while HistoCore delivers substantial speedups over other Index2core methods and even surpasses PeelOne on several datasets. The results demonstrate that carefully designed, parallel-synchronization aware strategies can close the gap between Peel and Index2core on GPUs, enabling scalable, high-performance k-core decomposition for large-scale graphs.

Abstract

Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs. Solutions to this problem have been developed using both bottom-up and top-down approaches from the perspective of vertex convergence dependency. However, existing algorithms have not effectively harnessed GPU performance to expedite core decomposition, despite the growing need for enhanced performance. Moreover, approaching performance limitations of core decomposition from two different directions within a parallel synchronization structure has not been thoroughly explored. This paper introduces an efficient GPU acceleration framework, PICO, for the Peel and Index2core paradigms of k-core decomposition. We propose PeelOne, a Peel-based algorithm designed to simplify the parallel logic and minimize atomic operations by eliminating vertices that are 'under-core'. We also propose an Index2core-based algorithm, named HistoCore, which addresses the issue of extensive redundant computations across both vertices and edges. Extensive experiments on NVIDIA RTX 3090 GPU show that PeelOne outperforms all other Peel-based algorithms, and HistoCore outperforms all other Index2core-based algorithms. Furthermore, HistoCore even outperforms PeelOne by 1.1x - 3.2x speedup on six datasets, which breaks the stereotype that the Index2core paradigm performs much worse than the Peel in a shared memory parallel setting.

PICO: Accelerating All k-Core Paradigms on GPU

TL;DR

This work targets the computational bottlenecks of k-core decomposition on GPUs by presenting PICO, a unified framework that optimizes both Peel and Index2core paradigms. It introduces PeelOne, which uses the under-core concept and an assertion-based atomic operation to reduce synchronization and atomic overhead, and HistoCore, which employs a cnt-based frontier selection with a histogram-maintenance strategy to minimize redundant edge accesses. Across 24 diverse graphs on an RTX , PeelOne achieves strong performance gains over state-of-the-art Peel implementations, while HistoCore delivers substantial speedups over other Index2core methods and even surpasses PeelOne on several datasets. The results demonstrate that carefully designed, parallel-synchronization aware strategies can close the gap between Peel and Index2core on GPUs, enabling scalable, high-performance k-core decomposition for large-scale graphs.

Abstract

Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs. Solutions to this problem have been developed using both bottom-up and top-down approaches from the perspective of vertex convergence dependency. However, existing algorithms have not effectively harnessed GPU performance to expedite core decomposition, despite the growing need for enhanced performance. Moreover, approaching performance limitations of core decomposition from two different directions within a parallel synchronization structure has not been thoroughly explored. This paper introduces an efficient GPU acceleration framework, PICO, for the Peel and Index2core paradigms of k-core decomposition. We propose PeelOne, a Peel-based algorithm designed to simplify the parallel logic and minimize atomic operations by eliminating vertices that are 'under-core'. We also propose an Index2core-based algorithm, named HistoCore, which addresses the issue of extensive redundant computations across both vertices and edges. Extensive experiments on NVIDIA RTX 3090 GPU show that PeelOne outperforms all other Peel-based algorithms, and HistoCore outperforms all other Index2core-based algorithms. Furthermore, HistoCore even outperforms PeelOne by 1.1x - 3.2x speedup on six datasets, which breaks the stereotype that the Index2core paradigm performs much worse than the Peel in a shared memory parallel setting.
Paper Structure (39 sections, 3 theorems, 8 figures, 7 tables, 6 algorithms)

This paper contains 39 sections, 3 theorems, 8 figures, 7 tables, 6 algorithms.

Key Result

Theorem 1

When locating the $k$-core in the $G$, the coreness of the under-core vertex is $k$.

Figures (8)

  • Figure 1: An illustration of $k$-cores and coreness resulted from core decomposition in the example graph $G_1$.
  • Figure 2: The commonly utilized parallel Peel method in parallel.
  • Figure 3: The proportion of vertices and edges that need multiple access in dataset soc-twitter-2010.
  • Figure 4: The atomic operations involved in the reduction of the degree of under-core vertices.
  • Figure 5: The procedure of PeelOne method in parallel.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 1: Under-Core Vertex
  • Theorem 1
  • Corollary 1
  • Theorem 2