PICO: Accelerating All k-Core Paradigms on GPU
Chen Zhao, Ting Yu, Zhigao Zheng, Song Jin, Jiawei Jiang, Bo Du, Dacheng Tao
TL;DR
This work targets the computational bottlenecks of k-core decomposition on GPUs by presenting PICO, a unified framework that optimizes both Peel and Index2core paradigms. It introduces PeelOne, which uses the under-core concept and an assertion-based atomic operation to reduce synchronization and atomic overhead, and HistoCore, which employs a cnt-based frontier selection with a histogram-maintenance strategy to minimize redundant edge accesses. Across 24 diverse graphs on an RTX $3090$, PeelOne achieves strong performance gains over state-of-the-art Peel implementations, while HistoCore delivers substantial speedups over other Index2core methods and even surpasses PeelOne on several datasets. The results demonstrate that carefully designed, parallel-synchronization aware strategies can close the gap between Peel and Index2core on GPUs, enabling scalable, high-performance k-core decomposition for large-scale graphs.
Abstract
Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs. Solutions to this problem have been developed using both bottom-up and top-down approaches from the perspective of vertex convergence dependency. However, existing algorithms have not effectively harnessed GPU performance to expedite core decomposition, despite the growing need for enhanced performance. Moreover, approaching performance limitations of core decomposition from two different directions within a parallel synchronization structure has not been thoroughly explored. This paper introduces an efficient GPU acceleration framework, PICO, for the Peel and Index2core paradigms of k-core decomposition. We propose PeelOne, a Peel-based algorithm designed to simplify the parallel logic and minimize atomic operations by eliminating vertices that are 'under-core'. We also propose an Index2core-based algorithm, named HistoCore, which addresses the issue of extensive redundant computations across both vertices and edges. Extensive experiments on NVIDIA RTX 3090 GPU show that PeelOne outperforms all other Peel-based algorithms, and HistoCore outperforms all other Index2core-based algorithms. Furthermore, HistoCore even outperforms PeelOne by 1.1x - 3.2x speedup on six datasets, which breaks the stereotype that the Index2core paradigm performs much worse than the Peel in a shared memory parallel setting.
