Table of Contents
Fetching ...

Revisiting the Design of In-Memory Dynamic Graph Storage

Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, Jiaxin Jiang, Yao Chen, Chenyi Zhang, Bingsheng He, Minyi Guo

TL;DR

The paper revisits in-memory dynamic graph storage by proposing a common abstraction and a generic test framework to fairly compare diverse DGS methods. It benchmarks five DGS approaches against CSR/AdjLst across real-world and synthetic graphs, revealing substantial memory overhead and contention in fine-grained methods, while coarse-grained approaches like Aspen mitigate some concurrency costs but at memory or scalability trade-offs. Key findings show dynamic arrays excel for vertex access, segmented neighbor indexes boost insert performance on large neighbor sets, and adaptive indexing helps in real-world graphs, yet a persistent gap remains between DGS and static CSR in read efficiency and memory footprint. The work provides actionable design guidance and highlights hardware-aware optimizations as critical for practical real-time graph analytics.

Abstract

The effectiveness of in-memory dynamic graph storage (DGS) for supporting concurrent graph read and write queries is crucial for real-time graph analytics and updates. Various methods have been proposed, for example, LLAMA, Aspen, LiveGraph, Teseo, and Sortledton. These approaches differ significantly in their support for read and write operations, space overhead, and concurrency control. However, there has been no systematic study to explore the trade-offs among these dimensions. In this paper, we evaluate the effectiveness of individual techniques and identify the performance factors affecting these storage methods by proposing a common abstraction for DGS design and implementing a generic test framework based on this abstraction. Our findings highlight several key insights: 1) Existing DGS methods exhibit substantial space overhead. For example, Aspen consumes 3.3-10.8x more memory than CSR, while the optimal fine-grained methods consume 4.1-8.9x more memory than CSR, indicating a significant memory overhead. 2) Existing methods often overlook memory access impact of modern architectures, leading to performance degradation compared to continuous storage methods. 3) Fine-grained concurrency control methods, in particular, suffer from severe efficiency and space issues due to maintaining versions and performing checks for each neighbor. These methods also experience significant contention on high-degree vertices. Our systematic study reveals these performance bottlenecks and outlines future directions to improve DGS for real-time graph analytics.

Revisiting the Design of In-Memory Dynamic Graph Storage

TL;DR

The paper revisits in-memory dynamic graph storage by proposing a common abstraction and a generic test framework to fairly compare diverse DGS methods. It benchmarks five DGS approaches against CSR/AdjLst across real-world and synthetic graphs, revealing substantial memory overhead and contention in fine-grained methods, while coarse-grained approaches like Aspen mitigate some concurrency costs but at memory or scalability trade-offs. Key findings show dynamic arrays excel for vertex access, segmented neighbor indexes boost insert performance on large neighbor sets, and adaptive indexing helps in real-world graphs, yet a persistent gap remains between DGS and static CSR in read efficiency and memory footprint. The work provides actionable design guidance and highlights hardware-aware optimizations as critical for practical real-time graph analytics.

Abstract

The effectiveness of in-memory dynamic graph storage (DGS) for supporting concurrent graph read and write queries is crucial for real-time graph analytics and updates. Various methods have been proposed, for example, LLAMA, Aspen, LiveGraph, Teseo, and Sortledton. These approaches differ significantly in their support for read and write operations, space overhead, and concurrency control. However, there has been no systematic study to explore the trade-offs among these dimensions. In this paper, we evaluate the effectiveness of individual techniques and identify the performance factors affecting these storage methods by proposing a common abstraction for DGS design and implementing a generic test framework based on this abstraction. Our findings highlight several key insights: 1) Existing DGS methods exhibit substantial space overhead. For example, Aspen consumes 3.3-10.8x more memory than CSR, while the optimal fine-grained methods consume 4.1-8.9x more memory than CSR, indicating a significant memory overhead. 2) Existing methods often overlook memory access impact of modern architectures, leading to performance degradation compared to continuous storage methods. 3) Fine-grained concurrency control methods, in particular, suffer from severe efficiency and space issues due to maintaining versions and performing checks for each neighbor. These methods also experience significant contention on high-degree vertices. Our systematic study reveals these performance bottlenecks and outlines future directions to improve DGS for real-time graph analytics.

Paper Structure

This paper contains 32 sections, 1 theorem, 1 equation, 21 figures, 15 tables.

Key Result

lemma 1

Suppose DGS maintains the serializability of write queries with the serial execution order $\Delta \mathcal{G}$. Given a read query $Q$ starting at timestamp $i$, ensuring $Q$ has a consistent view of $G_i = G_0 \oplus...\oplus \Delta G_i$ guarantees global serializable isolation for read and write

Figures (21)

  • Figure 1: Comparison of DGS methods from previous experiments. An edge from $x$ to $y$ indicates that $x$’s experiments include $y$. Shaded methods are transactional approaches.
  • Figure 2: The abstraction of graph query and data.
  • Figure 3: The abstraction of graph operations.
  • Figure 4: The neighbor index of $N(u_2)$ in LiveGraph.
  • Figure 5: The neighbor index of $N(u_2)$ in Sortledton.
  • ...and 16 more figures

Theorems & Definitions (1)

  • lemma 1