Parallel $k$-Core Decomposition with Batched Updates and Asynchronous Reads

Quanquan C. Liu; Julian Shun; Igor Zablotchi

Parallel $k$-Core Decomposition with Batched Updates and Asynchronous Reads

Quanquan C. Liu, Julian Shun, Igor Zablotchi

TL;DR

This work addresses maintaining a dynamic $k$-core decomposition under frequent graph updates while prioritizing read latency. It introduces the concurrent parallel level data structure (CPLDS), which allows asynchronous reads to proceed concurrently with batched updates by tracking causal dependencies in dependency DAGs and enforcing a DAG-atomicity rule. The approach preserves a $(2+\epsilon)$-approximation to coreness, guarantees linearizability for reads and liveness for updates, and demonstrates dramatic reductions in read latency (up to $4.05\cdot 10^5$-fold) with modest update overhead on a 30-core machine. Empirically, CPLDS achieves higher read throughput and significantly more accurate coreness estimates than non-linearizable baselines, while maintaining competitive update times, highlighting its practicality for read-dominated workloads in dynamic graphs.

Abstract

Maintaining a dynamic $k$-core decomposition is an important problem that identifies dense subgraphs in dynamically changing graphs. Recent work by Liu et al. [SPAA 2022] presents a parallel batch-dynamic algorithm for maintaining an approximate $k$-core decomposition. In their solution, both reads and updates need to be batched, and therefore each type of operation can incur high latency waiting for the other type to finish. To tackle most real-world workloads, which are dominated by reads, this paper presents a novel hybrid concurrent-parallel dynamic $k$-core data structure where asynchronous reads can proceed concurrently with batches of updates, leading to significantly lower read latencies. Our approach is based on tracking causal dependencies between updates, so that causally related groups of updates appear atomic to concurrent readers. Our data structure guarantees linearizability and liveness for both reads and updates, and maintains the same approximation guarantees as prior work. Our experimental evaluation on a 30-core machine shows that our approach reduces read latency by orders of magnitude compared to the batch-dynamic algorithm, up to a $\left(4.05 \cdot 10^{5}\right)$-factor. Compared to an unsynchronized (non-linearizable) baseline, our read latency overhead is only up to a $3.21$-factor greater, while improving accuracy of coreness estimates by up to a factor of $52.7$.

Parallel $k$-Core Decomposition with Batched Updates and Asynchronous Reads

TL;DR

This work addresses maintaining a dynamic

-core decomposition under frequent graph updates while prioritizing read latency. It introduces the concurrent parallel level data structure (CPLDS), which allows asynchronous reads to proceed concurrently with batched updates by tracking causal dependencies in dependency DAGs and enforcing a DAG-atomicity rule. The approach preserves a

-approximation to coreness, guarantees linearizability for reads and liveness for updates, and demonstrates dramatic reductions in read latency (up to

-fold) with modest update overhead on a 30-core machine. Empirically, CPLDS achieves higher read throughput and significantly more accurate coreness estimates than non-linearizable baselines, while maintaining competitive update times, highlighting its practicality for read-dominated workloads in dynamic graphs.

Abstract

Maintaining a dynamic

-core decomposition is an important problem that identifies dense subgraphs in dynamically changing graphs. Recent work by Liu et al. [SPAA 2022] presents a parallel batch-dynamic algorithm for maintaining an approximate

-core decomposition. In their solution, both reads and updates need to be batched, and therefore each type of operation can incur high latency waiting for the other type to finish. To tackle most real-world workloads, which are dominated by reads, this paper presents a novel hybrid concurrent-parallel dynamic

-core data structure where asynchronous reads can proceed concurrently with batches of updates, leading to significantly lower read latencies. Our approach is based on tracking causal dependencies between updates, so that causally related groups of updates appear atomic to concurrent readers. Our data structure guarantees linearizability and liveness for both reads and updates, and maintains the same approximation guarantees as prior work. Our experimental evaluation on a 30-core machine shows that our approach reduces read latency by orders of magnitude compared to the batch-dynamic algorithm, up to a

-factor. Compared to an unsynchronized (non-linearizable) baseline, our read latency overhead is only up to a

-factor greater, while improving accuracy of coreness estimates by up to a factor of

Paper Structure (23 sections, 4 theorems, 7 figures, 1 table)

This paper contains 23 sections, 4 theorems, 7 figures, 1 table.

Introduction
Preliminaries
Background
Level Data Structure (LDS)
Parallel LDS (PLDS)
Algorithm Overview
Detailed Algorithm
Data Structures and Global State
Updates
Reads
Correctness
Safety (Linearizability)
Linearization Points are Sound
Liveness
Approximation Guarantees
...and 8 more sections

Key Result

lemma 1

Let $\hat{k}(v)$ be the coreness estimate and $k(v)$ be the coreness of $v$, respectively. If $k(v) > \left(2 + 3/\lambda\right)(1+\delta)^{g'}$, then $\hat{k}(v) \geq (1+\delta)^{g'}$. Otherwise, if $k(v) < \frac{(1+\delta)^{g'}}{\left(2 + 3/\lambda\right)(1+\delta)}$, then $\hat{k}(v) < (1+\delta)

Figures (7)

Figure 1: A PLDS and a dependency DAG in which $v$'s and $w$'s level changes are indirectly caused by the level change of $u$. In any sequential execution, the operation that causes the level of $u$ to change also changes the levels of $v$ and $w$. Thus, it is impossible in any sequential execution for a read to return the old level of $u$, $v$, or $w$ after another read has already returned the new level of one of these vertices. To ensure linearizability, our algorithm must therefore guarantee that level changes to vertices in the same DAG appear to take effect atomically to concurrent readers.
Figure 2: The insertion batch is shown in red. The batch causes the yellow, green, blue, and purple vertices to move up one level with the created dependency DAG shown below. Then, the green, blue and purple vertices continue moving up the levels. Finally, the green, blue, and purple vertices cause the gray vertex to move up a level. Since the green, blue, and purple vertices are all in the same dependency DAG, the gray vertex points to the root (the blue vertex).
Figure 3: Comparison of the average, $99$-th percentile, and $99.99$-th percentile read latencies of the implementations under batches of insertions or deletions. The $y$-axis is in log-scale. Twitter times out for SyncReads and we do not show their results.
Figure 4: Comparison of the latencies over different insertion batch sizes using $15$ update threads and $15$ read threads. The $y$-axis is in log-scale. We tested on yt and dblp.
Figure 5: Comparison of the average and maximum batch update time over all batches and trials using $15$ update threads and $15$ read threads. The $y$-axis is in log-scale. Twitter times out for SyncReads and we do not show their results.
...and 2 more figures

Theorems & Definitions (12)

definition 1: $k$-Core
definition 2: $k$-Core Decomposition
definition 3: $c$-Approximate $k$-Core Decomposition
definition 4: Coreness Estimate
lemma 1
theorem 1
definition 5
lemma 2
definition 6
definition 7
...and 2 more

Parallel $k$-Core Decomposition with Batched Updates and Asynchronous Reads

TL;DR

Abstract

Parallel $k$-Core Decomposition with Batched Updates and Asynchronous Reads

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)