Parallel $k$-Core Decomposition with Batched Updates and Asynchronous Reads
Quanquan C. Liu, Julian Shun, Igor Zablotchi
TL;DR
This work addresses maintaining a dynamic $k$-core decomposition under frequent graph updates while prioritizing read latency. It introduces the concurrent parallel level data structure (CPLDS), which allows asynchronous reads to proceed concurrently with batched updates by tracking causal dependencies in dependency DAGs and enforcing a DAG-atomicity rule. The approach preserves a $(2+\epsilon)$-approximation to coreness, guarantees linearizability for reads and liveness for updates, and demonstrates dramatic reductions in read latency (up to $4.05\cdot 10^5$-fold) with modest update overhead on a 30-core machine. Empirically, CPLDS achieves higher read throughput and significantly more accurate coreness estimates than non-linearizable baselines, while maintaining competitive update times, highlighting its practicality for read-dominated workloads in dynamic graphs.
Abstract
Maintaining a dynamic $k$-core decomposition is an important problem that identifies dense subgraphs in dynamically changing graphs. Recent work by Liu et al. [SPAA 2022] presents a parallel batch-dynamic algorithm for maintaining an approximate $k$-core decomposition. In their solution, both reads and updates need to be batched, and therefore each type of operation can incur high latency waiting for the other type to finish. To tackle most real-world workloads, which are dominated by reads, this paper presents a novel hybrid concurrent-parallel dynamic $k$-core data structure where asynchronous reads can proceed concurrently with batches of updates, leading to significantly lower read latencies. Our approach is based on tracking causal dependencies between updates, so that causally related groups of updates appear atomic to concurrent readers. Our data structure guarantees linearizability and liveness for both reads and updates, and maintains the same approximation guarantees as prior work. Our experimental evaluation on a 30-core machine shows that our approach reduces read latency by orders of magnitude compared to the batch-dynamic algorithm, up to a $\left(4.05 \cdot 10^{5}\right)$-factor. Compared to an unsynchronized (non-linearizable) baseline, our read latency overhead is only up to a $3.21$-factor greater, while improving accuracy of coreness estimates by up to a factor of $52.7$.
